Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Special Interest Groups
  3. C++ Gurus
  4. Fastest way to read part of 300 Gigabyte binary file
QtWS25 Last Chance

Fastest way to read part of 300 Gigabyte binary file

Scheduled Pinned Locked Moved Solved C++ Gurus
58 Posts 7 Posters 13.3k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J J.Hilk
    7 Jan 2020, 07:57

    @Please_Help_me_D
    out of curiosity, do you build and run your tests in release mode?

    Compiler optimizations could go a long way in improving the speed, if you so far only ran debug builds.

    J Offline
    J Offline
    JonB
    wrote on 7 Jan 2020, 08:02 last edited by JonB 1 Jul 2020, 08:03
    #19

    @J-Hilk
    Out of interest: I hope you are right, but I don't see much in code which spends its time seeking and reading a few bytes out of an enormous file that will benefit from any code optimization. Presumably all the time is being taken in the OS calls themselves....

    J 1 Reply Last reply 7 Jan 2020, 08:07
    2
    • J JonB
      7 Jan 2020, 08:02

      @J-Hilk
      Out of interest: I hope you are right, but I don't see much in code which spends its time seeking and reading a few bytes out of an enormous file that will benefit from any code optimization. Presumably all the time is being taken in the OS calls themselves....

      J Offline
      J Offline
      J.Hilk
      Moderators
      wrote on 7 Jan 2020, 08:07 last edited by
      #20

      @JonB said in Fastest way to read part of 300 Gigabyte binary file:

      Presumably all the time is being taken in the OS calls themselves....

      you mean, most time is lost during the network access calls? Possibly. But I would expect at least a couple of seconds improvements anyway :)


      Be aware of the Qt Code of Conduct, when posting : https://forum.qt.io/topic/113070/qt-code-of-conduct


      Q: What's that?
      A: It's blue light.
      Q: What does it do?
      A: It turns blue.

      J 1 Reply Last reply 7 Jan 2020, 08:09
      1
      • J J.Hilk
        7 Jan 2020, 08:07

        @JonB said in Fastest way to read part of 300 Gigabyte binary file:

        Presumably all the time is being taken in the OS calls themselves....

        you mean, most time is lost during the network access calls? Possibly. But I would expect at least a couple of seconds improvements anyway :)

        J Offline
        J Offline
        JonB
        wrote on 7 Jan 2020, 08:09 last edited by JonB 1 Jul 2020, 08:10
        #21

        @J-Hilk
        I would not, can't see how it would save anything here. But that aside, the OP wrote earlier:

        @SGaist 15155 seconds (4 hours 12 min) it took to read these data.

        Your "couple of seconds" is not going to be ground-breaking on that timing, is it? ;-)

        OK, the OP has shown a newer, quicker timing. By all means try release optimization, worth a go :)

        1 Reply Last reply
        1
        • J J.Hilk
          7 Jan 2020, 07:57

          @Please_Help_me_D
          out of curiosity, do you build and run your tests in release mode?

          Compiler optimizations could go a long way in improving the speed, if you so far only ran debug builds.

          P Offline
          P Offline
          Please_Help_me_D
          wrote on 7 Jan 2020, 11:26 last edited by
          #22

          @J-Hilk Yes I did all the experiments in release mode

          1 Reply Last reply
          1
          • P Offline
            P Offline
            Please_Help_me_D
            wrote on 7 Jan 2020, 12:08 last edited by
            #23

            @Please_Help_me_D said in Fastest way to read part of 300 Gigabyte binary file:

            uchar *memory = file.map(3608, file.size()-3608);

            is it possible to represent *memory as a heap of type qint32 rather than uchar?

            J 1 Reply Last reply 7 Jan 2020, 12:16
            0
            • P Please_Help_me_D
              7 Jan 2020, 12:08

              @Please_Help_me_D said in Fastest way to read part of 300 Gigabyte binary file:

              uchar *memory = file.map(3608, file.size()-3608);

              is it possible to represent *memory as a heap of type qint32 rather than uchar?

              J Offline
              J Offline
              jsulm
              Lifetime Qt Champion
              wrote on 7 Jan 2020, 12:16 last edited by
              #24

              @Please_Help_me_D said in Fastest way to read part of 300 Gigabyte binary file:

              is it possible to represent *memory as a heap of type qint32 rather than uchar?

              Sure, cast the pointer to qint32*

              https://forum.qt.io/topic/113070/qt-code-of-conduct

              J 1 Reply Last reply 7 Jan 2020, 12:29
              3
              • J jsulm
                7 Jan 2020, 12:16

                @Please_Help_me_D said in Fastest way to read part of 300 Gigabyte binary file:

                is it possible to represent *memory as a heap of type qint32 rather than uchar?

                Sure, cast the pointer to qint32*

                J Offline
                J Offline
                JonB
                wrote on 7 Jan 2020, 12:29 last edited by JonB 1 Jul 2020, 12:33
                #25

                @jsulm
                Your answer is in principle correct. However, should we warn the OP that I'm thinking this will only "work" if the return result from the QFile::map() he calls (given his offsets) is suitably aligned at a 32-bit boundary for qint32 * to address without segmenting?? I don't see the Qt docs mentioning whether this is the case for the normally-uchar * return result?

                J J 2 Replies Last reply 7 Jan 2020, 12:42
                3
                • J JonB
                  7 Jan 2020, 12:29

                  @jsulm
                  Your answer is in principle correct. However, should we warn the OP that I'm thinking this will only "work" if the return result from the QFile::map() he calls (given his offsets) is suitably aligned at a 32-bit boundary for qint32 * to address without segmenting?? I don't see the Qt docs mentioning whether this is the case for the normally-uchar * return result?

                  J Offline
                  J Offline
                  jsulm
                  Lifetime Qt Champion
                  wrote on 7 Jan 2020, 12:42 last edited by
                  #26

                  @JonB Could be, I'm not sure

                  https://forum.qt.io/topic/113070/qt-code-of-conduct

                  1 Reply Last reply
                  1
                  • J JonB
                    7 Jan 2020, 12:29

                    @jsulm
                    Your answer is in principle correct. However, should we warn the OP that I'm thinking this will only "work" if the return result from the QFile::map() he calls (given his offsets) is suitably aligned at a 32-bit boundary for qint32 * to address without segmenting?? I don't see the Qt docs mentioning whether this is the case for the normally-uchar * return result?

                    J Offline
                    J Offline
                    J.Hilk
                    Moderators
                    wrote on 7 Jan 2020, 12:45 last edited by
                    #27

                    @JonB
                    well if you take a look at the loop so far:

                    for(qint64 i = 0; i < N; i++){
                                FFID[i] = memory[i*Nb];
                                FFID[i+1] = memory[i*Nb+1];
                                FFID[i+2] = memory[i*Nb+2];
                                FFID[i+3] = memory[i*Nb+3];
                            }
                    

                    no checks inside the loop nor before, so it's going to hard crash any way, when the file is not int32_t aligned.


                    Be aware of the Qt Code of Conduct, when posting : https://forum.qt.io/topic/113070/qt-code-of-conduct


                    Q: What's that?
                    A: It's blue light.
                    Q: What does it do?
                    A: It turns blue.

                    J 1 Reply Last reply 7 Jan 2020, 12:56
                    1
                    • J J.Hilk
                      7 Jan 2020, 12:45

                      @JonB
                      well if you take a look at the loop so far:

                      for(qint64 i = 0; i < N; i++){
                                  FFID[i] = memory[i*Nb];
                                  FFID[i+1] = memory[i*Nb+1];
                                  FFID[i+2] = memory[i*Nb+2];
                                  FFID[i+3] = memory[i*Nb+3];
                              }
                      

                      no checks inside the loop nor before, so it's going to hard crash any way, when the file is not int32_t aligned.

                      J Offline
                      J Offline
                      JonB
                      wrote on 7 Jan 2020, 12:56 last edited by JonB 1 Jul 2020, 13:02
                      #28

                      @J-Hilk
                      Umm, no, I don't see that. His current uchar *memory means it's only picking up bytes from there. And he made his FFID be QVector<uchar>. So he is copying one byte at a time (which is what I think he wants to get rid of), and current code won't have odd-boundary-memory-alignment issue. But new code with qint32* for uchar* could have problem....

                      If his offset is always like the example 7996 so it's divisible by 4 always then I would guess the return result from QFile::map() will not show any problem. This is an issue which does not arise when reading numbers from file, only from mapping, so just to be aware.

                      J P 2 Replies Last reply 7 Jan 2020, 13:04
                      1
                      • J JonB
                        7 Jan 2020, 12:56

                        @J-Hilk
                        Umm, no, I don't see that. His current uchar *memory means it's only picking up bytes from there. And he made his FFID be QVector<uchar>. So he is copying one byte at a time (which is what I think he wants to get rid of), and current code won't have odd-boundary-memory-alignment issue. But new code with qint32* for uchar* could have problem....

                        If his offset is always like the example 7996 so it's divisible by 4 always then I would guess the return result from QFile::map() will not show any problem. This is an issue which does not arise when reading numbers from file, only from mapping, so just to be aware.

                        J Offline
                        J Offline
                        J.Hilk
                        Moderators
                        wrote on 7 Jan 2020, 13:04 last edited by
                        #29

                        @JonB
                        really? And what guarantees, that memory[i*Nb+3]; will be part of the valid memory ?

                        I assume this, is, what the OP wants to do

                        QVector<uchar> FFID(N*4); -> QVector<qint32> FFID(N);
                        uchar *memory -> qint32 *memory
                        
                        and 
                        for(qint64 i = 0; i < N; i++){
                                    FFID[i] = memory[i*Nb];
                                }
                        

                        Be aware of the Qt Code of Conduct, when posting : https://forum.qt.io/topic/113070/qt-code-of-conduct


                        Q: What's that?
                        A: It's blue light.
                        Q: What does it do?
                        A: It turns blue.

                        1 Reply Last reply
                        1
                        • J JonB
                          7 Jan 2020, 12:56

                          @J-Hilk
                          Umm, no, I don't see that. His current uchar *memory means it's only picking up bytes from there. And he made his FFID be QVector<uchar>. So he is copying one byte at a time (which is what I think he wants to get rid of), and current code won't have odd-boundary-memory-alignment issue. But new code with qint32* for uchar* could have problem....

                          If his offset is always like the example 7996 so it's divisible by 4 always then I would guess the return result from QFile::map() will not show any problem. This is an issue which does not arise when reading numbers from file, only from mapping, so just to be aware.

                          P Offline
                          P Offline
                          Please_Help_me_D
                          wrote on 7 Jan 2020, 13:21 last edited by
                          #30

                          @jsulm thank you, that works!
                          @JonB @J-Hilk I think I see what you are discussing and I keep that in mind.
                          If I map the part of a file that is is not equal to N*4 (like in the code below) my program doesn't output any error or command line. Compiler says that it was succesfully built and application output throws that it started and one second later it is terminated.

                          #include <QCoreApplication>
                          #include <QFile>
                          #include <QVector>
                          //#include <QIODevice>
                          #include <armadillo>
                          using namespace arma;
                          
                          int main()
                          {
                              char segyFile[]{"C:/Users/tasik/Documents/Qt_Projects/raw_le.sgy"};
                              QFile file(segyFile);
                              if (!file.open(QIODevice::ReadOnly)) {
                                   //handle error
                              }
                              //qint32 *memory = new qint32;
                              //(uchar*)&memory;
                              uchar* memory = file.map(3608, file.size()-3607); // Here the mappable part file.size()-3607 has some remainder of the division by 4 
                              (qint32*) memory;
                              if (memory) {
                                  std::cout << "started..." << std::endl;
                                  wall_clock timer;
                                  qint64 fSize = file.size();
                                  qint64 N = 44861;
                                  qint64 Nb = 661*4;
                                  QVector<qint32> FFID(N);
                                  (uchar *)&FFID;
                                  timer.tic();
                                  for(qint64 i = 0; i < N; i++){
                                      FFID[i] = memory[i*Nb];
                                      /*FFID[i+1] = memory[i*Nb+1];
                                      FFID[i+2] = memory[i*Nb+2];
                                      FFID[i+3] = memory[i*Nb+3];*/
                                      std::cout << FFID[i] << std::endl;
                                  }
                                  double n0 = timer.toc();
                                  std::cout << n0 << std::endl;
                                  std::cout << "finished!" << std::endl;
                              }
                          }
                          
                          J 1 Reply Last reply 7 Jan 2020, 13:44
                          0
                          • P Please_Help_me_D
                            7 Jan 2020, 13:21

                            @jsulm thank you, that works!
                            @JonB @J-Hilk I think I see what you are discussing and I keep that in mind.
                            If I map the part of a file that is is not equal to N*4 (like in the code below) my program doesn't output any error or command line. Compiler says that it was succesfully built and application output throws that it started and one second later it is terminated.

                            #include <QCoreApplication>
                            #include <QFile>
                            #include <QVector>
                            //#include <QIODevice>
                            #include <armadillo>
                            using namespace arma;
                            
                            int main()
                            {
                                char segyFile[]{"C:/Users/tasik/Documents/Qt_Projects/raw_le.sgy"};
                                QFile file(segyFile);
                                if (!file.open(QIODevice::ReadOnly)) {
                                     //handle error
                                }
                                //qint32 *memory = new qint32;
                                //(uchar*)&memory;
                                uchar* memory = file.map(3608, file.size()-3607); // Here the mappable part file.size()-3607 has some remainder of the division by 4 
                                (qint32*) memory;
                                if (memory) {
                                    std::cout << "started..." << std::endl;
                                    wall_clock timer;
                                    qint64 fSize = file.size();
                                    qint64 N = 44861;
                                    qint64 Nb = 661*4;
                                    QVector<qint32> FFID(N);
                                    (uchar *)&FFID;
                                    timer.tic();
                                    for(qint64 i = 0; i < N; i++){
                                        FFID[i] = memory[i*Nb];
                                        /*FFID[i+1] = memory[i*Nb+1];
                                        FFID[i+2] = memory[i*Nb+2];
                                        FFID[i+3] = memory[i*Nb+3];*/
                                        std::cout << FFID[i] << std::endl;
                                    }
                                    double n0 = timer.toc();
                                    std::cout << n0 << std::endl;
                                    std::cout << "finished!" << std::endl;
                                }
                            }
                            
                            J Offline
                            J Offline
                            JonB
                            wrote on 7 Jan 2020, 13:44 last edited by JonB 1 Jul 2020, 15:14
                            #31

                            @Please_Help_me_D said in Fastest way to read part of 300 Gigabyte binary file:

                            and application output throws that it started and one second later it is terminated.

                            Yes, that was my point. You won't get a compilation error. You would get a run-time "crash" on something like line FFID[i] = memory[i*Nb];. Under Linux you'd get a core dump (if enabled), under Windoze I don't know but would have thought it would bring up a message box of some kind.

                            However, I haven't got time, I don't think the code you've written reflects this. For a start statements (qint32*) memory; and (uchar *)&FFID; are No-Ops (turn compiler warnings level up, you might get a warning of "no effect" for these lines, you should always develop with highest warning level you can). You haven't changed over the memory to qint32*, what you seem to think is how to do casts is wrong. This is C/C++ stuff. You'll want something more like

                            qint32* memory = static_cast<qint32*>(file.map(3608, file.size()-3607));

                            qint32* memory = reinterpret_cast<qint32*>(file.map(3608, file.size()-3607)); 
                            

                            but I haven't got time to sort you out. And if you do that you need to understand how to then index it, it won't be the same offsets as you used when it was uchar*. Don't try to change to qint32* for your accesses if you don't know what you're doing cast-wise in C/C++! :)

                            P 1 Reply Last reply 7 Jan 2020, 14:54
                            2
                            • J JonB
                              7 Jan 2020, 13:44

                              @Please_Help_me_D said in Fastest way to read part of 300 Gigabyte binary file:

                              and application output throws that it started and one second later it is terminated.

                              Yes, that was my point. You won't get a compilation error. You would get a run-time "crash" on something like line FFID[i] = memory[i*Nb];. Under Linux you'd get a core dump (if enabled), under Windoze I don't know but would have thought it would bring up a message box of some kind.

                              However, I haven't got time, I don't think the code you've written reflects this. For a start statements (qint32*) memory; and (uchar *)&FFID; are No-Ops (turn compiler warnings level up, you might get a warning of "no effect" for these lines, you should always develop with highest warning level you can). You haven't changed over the memory to qint32*, what you seem to think is how to do casts is wrong. This is C/C++ stuff. You'll want something more like

                              qint32* memory = static_cast<qint32*>(file.map(3608, file.size()-3607));

                              qint32* memory = reinterpret_cast<qint32*>(file.map(3608, file.size()-3607)); 
                              

                              but I haven't got time to sort you out. And if you do that you need to understand how to then index it, it won't be the same offsets as you used when it was uchar*. Don't try to change to qint32* for your accesses if you don't know what you're doing cast-wise in C/C++! :)

                              P Offline
                              P Offline
                              Please_Help_me_D
                              wrote on 7 Jan 2020, 14:54 last edited by
                              #32

                              @JonB said in Fastest way to read part of 300 Gigabyte binary file:

                              qint32* memory = static_cast<qint32*>(file.map(3608, file.size()-3607));

                              thank you but this sends me an error:

                              main.cpp:17:22: error: static_cast from 'uchar *' (aka 'unsigned char *') to 'qint32 *' (aka 'int *') is not allowed
                              
                              J 1 Reply Last reply 7 Jan 2020, 14:57
                              0
                              • P Please_Help_me_D
                                7 Jan 2020, 14:54

                                @JonB said in Fastest way to read part of 300 Gigabyte binary file:

                                qint32* memory = static_cast<qint32*>(file.map(3608, file.size()-3607));

                                thank you but this sends me an error:

                                main.cpp:17:22: error: static_cast from 'uchar *' (aka 'unsigned char *') to 'qint32 *' (aka 'int *') is not allowed
                                
                                J Offline
                                J Offline
                                J.Hilk
                                Moderators
                                wrote on 7 Jan 2020, 14:57 last edited by
                                #33

                                @Please_Help_me_D
                                @JonB meant to write reinterpret_cast not static_cast there are few uses for reinterpret_cast but this is one :)


                                Be aware of the Qt Code of Conduct, when posting : https://forum.qt.io/topic/113070/qt-code-of-conduct


                                Q: What's that?
                                A: It's blue light.
                                Q: What does it do?
                                A: It turns blue.

                                P 1 Reply Last reply 7 Jan 2020, 15:29
                                2
                                • J J.Hilk
                                  7 Jan 2020, 14:57

                                  @Please_Help_me_D
                                  @JonB meant to write reinterpret_cast not static_cast there are few uses for reinterpret_cast but this is one :)

                                  P Offline
                                  P Offline
                                  Please_Help_me_D
                                  wrote on 7 Jan 2020, 15:29 last edited by
                                  #34

                                  @J-Hilk ok, now it works :)

                                  1 Reply Last reply
                                  1
                                  • SGaistS SGaist
                                    5 Jan 2020, 15:47

                                    Did you consider mapping only the parts that are pertinent to what you want to read ?

                                    P Offline
                                    P Offline
                                    Please_Help_me_D
                                    wrote on 7 Jan 2020, 15:41 last edited by
                                    #35

                                    @SGaist said in Fastest way to read part of 300 Gigabyte binary file:

                                    Did you consider mapping only the parts that are pertinent to what you want to read ?

                                    I don't know how to do that but I saw something like this in BOOST C++ documentation . Here is writen:
                                    What is a memory mapped file?
                                    File mapping is the association of a file's contents with a portion of the address space of a process. The system creates a file mapping to associate the file and the address space of the process. A mapped region is the portion of address space that the process uses to access the file's contents. A single file mapping can have several mapped regions, so that the user can associate parts of the file with the address space of the process without mapping the entire file in the address space, since the file can be bigger than the whole address space of the process (a 9GB DVD image file in a usual 32 bit systems). Processes read from and write to the file using pointers, just like with dynamic memory.

                                    Maybe if I could map only regions of my file that I need to read then it would speed up my application? Does Qt provide something like that?

                                    1 Reply Last reply
                                    0
                                    • SGaistS Offline
                                      SGaistS Offline
                                      SGaist
                                      Lifetime Qt Champion
                                      wrote on 7 Jan 2020, 16:01 last edited by
                                      #36

                                      Well... As already said, the map function takes an offset in your file and a size so you can map several regions of it with that. It's nowhere written that you have to passe an offset of zero and the full file size.

                                      Interested in AI ? www.idiap.ch
                                      Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                                      P 1 Reply Last reply 7 Jan 2020, 16:11
                                      1
                                      • SGaistS SGaist
                                        7 Jan 2020, 16:01

                                        Well... As already said, the map function takes an offset in your file and a size so you can map several regions of it with that. It's nowhere written that you have to passe an offset of zero and the full file size.

                                        P Offline
                                        P Offline
                                        Please_Help_me_D
                                        wrote on 7 Jan 2020, 16:11 last edited by
                                        #37

                                        @SGaist I understand that I have offset and size parameters and actually I use them as a single valued numbers. If I want to map several regions of a file then I should use multiple offsets and multiple size but the example below doesn't work:

                                            qint64 offset[] = {100, 200, 300};
                                            qint64 size[] = {4, 4, 4};
                                            qint32* memory = reinterpret_cast<qint32*>(file.map(offset, size));
                                        

                                        The error I get is:
                                        main.cpp:19:57: error: cannot initialize a parameter of type 'qint64' (aka 'long long') with an lvalue of type 'qint64 [3]'
                                        qfiledevice.h:127:23: note: passing argument to parameter 'offset' here

                                        J 1 Reply Last reply 7 Jan 2020, 17:24
                                        0
                                        • SGaistS Offline
                                          SGaistS Offline
                                          SGaist
                                          Lifetime Qt Champion
                                          wrote on 7 Jan 2020, 17:07 last edited by
                                          #38

                                          You can't just replace an input type by an array of the same type. That's not how it's working. And in any case, the returned value of map is the address you'll have to pass to the unmap function.

                                          You won't avoid using a form of loop or another.

                                          Interested in AI ? www.idiap.ch
                                          Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                                          1 Reply Last reply
                                          3

                                          28/58

                                          7 Jan 2020, 12:56

                                          • Login

                                          • Login or register to search.
                                          28 out of 58
                                          • First post
                                            28/58
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Users
                                          • Groups
                                          • Search
                                          • Get Qt Extensions
                                          • Unsolved