Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Special Interest Groups
  3. C++ Gurus
  4. Fastest way to read part of 300 Gigabyte binary file
Forum Update on Monday, May 27th 2025

Fastest way to read part of 300 Gigabyte binary file

Scheduled Pinned Locked Moved Solved C++ Gurus
58 Posts 7 Posters 13.5k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Please_Help_me_DP Please_Help_me_D

    @JonB yes, that was exactly what I wanted!
    Despite of my humble knowledge in C/C++ programming I got an idea :)
    If I map the adress of the first value that I want to read (3600-3604 bites). Then calling:

    memory
    

    would show me the adress of that value. So my file is stored continuosly on the disk and the second quint4 value has to be on the (memory+4) adress. So if I call:

    first_value = *memory;
    second_value = *(memory+4);
    third_value = *(memory+8);
    

    Should this work? Would it be faster? I'm going to try

    JonBJ Offline
    JonBJ Offline
    JonB
    wrote on last edited by
    #46

    @Please_Help_me_D
    Huh? Do you mean you are intending to change the physical file content/format to move the data points you want to retrieve so that they are contiguous? Seems pretty surprising to me, one would assume the format is dictated by something else external to your program. But then you never have explained what this data/file is all about....

    Please_Help_me_DP 1 Reply Last reply
    0
    • JonBJ JonB

      @Please_Help_me_D
      Huh? Do you mean you are intending to change the physical file content/format to move the data points you want to retrieve so that they are contiguous? Seems pretty surprising to me, one would assume the format is dictated by something else external to your program. But then you never have explained what this data/file is all about....

      Please_Help_me_DP Offline
      Please_Help_me_DP Offline
      Please_Help_me_D
      wrote on last edited by
      #47

      @JonB no I don't want to change the content of a file. My file is like the following:

      • first 3600 bytes describe the rest of the file. Here I get information how much rows Nb and columns N I have

      • the rest of the file is a N-time repeating Nb number of bytes. We can represent this part as a matrix with Nb rows (or bytes if we multiply it by 4) and N columns and my task is to read a single row of this matrix, in other words I need to read every Nb byte since some starting byte (say 3600 or 3604 or something)
        Actually it is a little bit more complicated and some rows of this "matrix" is of qint16, other qint32 adn single.
        Here what I do and I get the correct values for few first qint32 rows:

          qint64 N = 44861;
          qint64 Nb = 100;
          memory = reinterpret_cast<qint32*>(file.map(3600, 4));
          for (int i = 0; i < N; i++){
              std::cout << memory+i << std::endl; //  adress
              std::cout << *(memory+i) << std::endl; // value
          }
      

      But my program breaks whe I try:

          qint64 N = 44861;
          qint64 Nb = 100;
          memory = reinterpret_cast<qint32*>(file.map(3600, 4));
          for (int i = 0; i < N; i++){
              std::cout << memory+i*Nb << std::endl;
              std::cout << *(memory+i*Nb) << std::endl;
          }
      

      Application output:
      15:54:06: C: \ Users \ tasik \ Documents \ Qt_Projects \ build-untitled1-Desktop_Qt_5_12_6_MSVC2017_64_bit-Release \ release \ untitled1.exe starts ...
      15:54:09: C: \ Users \ tasik \ Documents \ Qt_Projects \ build-untitled1-Desktop_Qt_5_12_6_MSVC2017_64_bit-Release \ release \ untitled1.exe completed with the code -1073741819

      JonBJ 1 Reply Last reply
      0
      • Please_Help_me_DP Offline
        Please_Help_me_DP Offline
        Please_Help_me_D
        wrote on last edited by
        #48

        Seems to me that this work only for 124*4 bytes.
        I just tested how much iterations completed before the program breaks for different Nb:

            for (int i = 0; i < N; i++){
                std::cout << *(memory+i*Nb) << std::endl;
            }
        
        • Nb = 1, max_iterator_i = 124
        • Nb = 2, max_iterator_i = 62
        • Nb = 4, max_iterator_i = 31
          So I think that my idea is not such good as I thought :)
        1 Reply Last reply
        0
        • SGaistS Offline
          SGaistS Offline
          SGaist
          Lifetime Qt Champion
          wrote on last edited by
          #49

          @Please_Help_me_D said in Fastest way to read part of 300 Gigabyte binary file:

          memory = reinterpret_cast<qint32*>(file.map(3600, 4));

          You are mapping a region of 4 bytes yet trying to read way past that.

          Interested in AI ? www.idiap.ch
          Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

          Please_Help_me_DP 1 Reply Last reply
          2
          • Please_Help_me_DP Please_Help_me_D

            @JonB no I don't want to change the content of a file. My file is like the following:

            • first 3600 bytes describe the rest of the file. Here I get information how much rows Nb and columns N I have

            • the rest of the file is a N-time repeating Nb number of bytes. We can represent this part as a matrix with Nb rows (or bytes if we multiply it by 4) and N columns and my task is to read a single row of this matrix, in other words I need to read every Nb byte since some starting byte (say 3600 or 3604 or something)
              Actually it is a little bit more complicated and some rows of this "matrix" is of qint16, other qint32 adn single.
              Here what I do and I get the correct values for few first qint32 rows:

                qint64 N = 44861;
                qint64 Nb = 100;
                memory = reinterpret_cast<qint32*>(file.map(3600, 4));
                for (int i = 0; i < N; i++){
                    std::cout << memory+i << std::endl; //  adress
                    std::cout << *(memory+i) << std::endl; // value
                }
            

            But my program breaks whe I try:

                qint64 N = 44861;
                qint64 Nb = 100;
                memory = reinterpret_cast<qint32*>(file.map(3600, 4));
                for (int i = 0; i < N; i++){
                    std::cout << memory+i*Nb << std::endl;
                    std::cout << *(memory+i*Nb) << std::endl;
                }
            

            Application output:
            15:54:06: C: \ Users \ tasik \ Documents \ Qt_Projects \ build-untitled1-Desktop_Qt_5_12_6_MSVC2017_64_bit-Release \ release \ untitled1.exe starts ...
            15:54:09: C: \ Users \ tasik \ Documents \ Qt_Projects \ build-untitled1-Desktop_Qt_5_12_6_MSVC2017_64_bit-Release \ release \ untitled1.exe completed with the code -1073741819

            JonBJ Offline
            JonBJ Offline
            JonB
            wrote on last edited by JonB
            #50

            @Please_Help_me_D
            I give up, I really don't understand what you think you are trying to achieve.

            If the data you want to fetch is physically separated all over the file, as you originally said if that hasn't changed, you are wasting your time trying to miraculously "coalesce/adjacentise" the data access in memory via mapping. It is vain attempt. Whichever way you look at it, if you have a physical hard disk it is going to have seek/move the head to reach discontinuous areas. That is what will "take time", and there is nothing you can do about it.....

            1 Reply Last reply
            4
            • SGaistS SGaist

              @Please_Help_me_D said in Fastest way to read part of 300 Gigabyte binary file:

              memory = reinterpret_cast<qint32*>(file.map(3600, 4));

              You are mapping a region of 4 bytes yet trying to read way past that.

              Please_Help_me_DP Offline
              Please_Help_me_DP Offline
              Please_Help_me_D
              wrote on last edited by
              #51

              @SGaist yes thank you
              @JonB I was wrong. Thank you for explanation

              1 Reply Last reply
              0
              • Please_Help_me_DP Offline
                Please_Help_me_DP Offline
                Please_Help_me_D
                wrote on last edited by
                #52

                Hi all again,

                I just noticed one thing:
                if I iterate through the mapped file of size 14 GygaBite I can see memory consumption that eats 4 GB of RAM in about 10 seconds. After that I have to stop the execution because of my RAM limit but it doesn't have any signs that it is going to stop growing.

                For example this code produces all that I say on Windows 10 x64, Qt 5.14.0, MSVC 64 2017:

                    qFile = new QFile("myBigFile");
                    uchar* memFile_uchar = qFile->map(0, qFile->size());
                    int val;
                    size_t I = qFile->size();
                    for(size_t i = 0; i < I; i++){
                        val = memFile_uchar[i];
                    }
                

                Hope somebody is able to explaing that...

                PS: When I was using Matlab and memory mapping technique there I was able to see similar behaviour there.

                JonBJ 1 Reply Last reply
                0
                • Please_Help_me_DP Please_Help_me_D

                  Hi all again,

                  I just noticed one thing:
                  if I iterate through the mapped file of size 14 GygaBite I can see memory consumption that eats 4 GB of RAM in about 10 seconds. After that I have to stop the execution because of my RAM limit but it doesn't have any signs that it is going to stop growing.

                  For example this code produces all that I say on Windows 10 x64, Qt 5.14.0, MSVC 64 2017:

                      qFile = new QFile("myBigFile");
                      uchar* memFile_uchar = qFile->map(0, qFile->size());
                      int val;
                      size_t I = qFile->size();
                      for(size_t i = 0; i < I; i++){
                          val = memFile_uchar[i];
                      }
                  

                  Hope somebody is able to explaing that...

                  PS: When I was using Matlab and memory mapping technique there I was able to see similar behaviour there.

                  JonBJ Offline
                  JonBJ Offline
                  JonB
                  wrote on last edited by
                  #53

                  @Please_Help_me_D
                  I'm not sure what you're asking here. You are mapping the whole of the file. As you begin to access data in the mapped area it gets brought into memory, and that takes up space. If you have limited memory, this is not a good idea.

                  I haven't used memory mapping myself, but presumably if you want to keep memory usage down you have to do something like only map partial areas of the file at a time (arguments to map()), and release previously mapped areas (unmap()). You'd have to test whether that actually results in less memory usage.

                  If you are limited in memory compared to the size of the file, perhaps you shouldn't be using memory mapping at all. File seeking to desired data won't have a memory overhead. In the code you show you are reading the data just once, so there may not be much difference. Have you actually measured performance with file versus memory-map access?

                  Please_Help_me_DP 1 Reply Last reply
                  2
                  • JonBJ JonB

                    @Please_Help_me_D
                    I'm not sure what you're asking here. You are mapping the whole of the file. As you begin to access data in the mapped area it gets brought into memory, and that takes up space. If you have limited memory, this is not a good idea.

                    I haven't used memory mapping myself, but presumably if you want to keep memory usage down you have to do something like only map partial areas of the file at a time (arguments to map()), and release previously mapped areas (unmap()). You'd have to test whether that actually results in less memory usage.

                    If you are limited in memory compared to the size of the file, perhaps you shouldn't be using memory mapping at all. File seeking to desired data won't have a memory overhead. In the code you show you are reading the data just once, so there may not be much difference. Have you actually measured performance with file versus memory-map access?

                    Please_Help_me_DP Offline
                    Please_Help_me_DP Offline
                    Please_Help_me_D
                    wrote on last edited by
                    #54

                    @JonB said in Fastest way to read part of 300 Gigabyte binary file:

                    I'm not sure what you're asking here. You are mapping the whole of the file. As you begin to access data in the mapped area it gets brought into memory, and that takes up space. If you have limited memory, this is not a good idea.

                    Well this helped me. So I divide my file by portions and unmap() those portions when they become unuseful. In this case there is no such memory consumption
                    Thank you!

                    N 2 Replies Last reply
                    0
                    • Please_Help_me_DP Please_Help_me_D

                      @JonB said in Fastest way to read part of 300 Gigabyte binary file:

                      I'm not sure what you're asking here. You are mapping the whole of the file. As you begin to access data in the mapped area it gets brought into memory, and that takes up space. If you have limited memory, this is not a good idea.

                      Well this helped me. So I divide my file by portions and unmap() those portions when they become unuseful. In this case there is no such memory consumption
                      Thank you!

                      N Offline
                      N Offline
                      Natural_Bugger
                      wrote on last edited by Natural_Bugger
                      #55

                      @Please_Help_me_D

                      http://www.c-jump.com/bcc/c155c/MemAccess/MemAccess.html
                      https://stackoverflow.com/questions/14324709/c-how-to-directly-access-memory

                      not sure if my input helps, but i'm gonna try anyway ..
                      you could try to create dynamically very large array's in memory.
                      using try, catch and catch the exception to see how much space you can reserve in your ram.
                      and automatically scale down until no exception is thrown and your array fits.
                      than read the file in chunks according to space reserved in your ram.

                      store data in a text file or so, in case you programs crashes half way through the file and you have to start all over again.
                      a sort of snapshot method.

                      https://stackoverflow.com/questions/2513505/how-to-get-available-memory-c-g

                      how does the data look like?

                      ChiliTomatoNoodle learned me about Structs.
                      https://www.youtube.com/user/ChiliTomatoNoodle
                      so you could make an array of structs that fit your data.

                      1 Reply Last reply
                      0
                      • Please_Help_me_DP Please_Help_me_D

                        @JonB said in Fastest way to read part of 300 Gigabyte binary file:

                        I'm not sure what you're asking here. You are mapping the whole of the file. As you begin to access data in the mapped area it gets brought into memory, and that takes up space. If you have limited memory, this is not a good idea.

                        Well this helped me. So I divide my file by portions and unmap() those portions when they become unuseful. In this case there is no such memory consumption
                        Thank you!

                        N Offline
                        N Offline
                        Natural_Bugger
                        wrote on last edited by
                        #56

                        @Please_Help_me_D

                        this might help you out even more.

                        https://stackoverflow.com/questions/7749066/how-to-catch-out-of-memory-exception-in-c

                        https://stackoverflow.com/questions/23587837/c-allocating-large-array-on-heap-gives-out-of-memory-exception

                        Please_Help_me_DP 1 Reply Last reply
                        0
                        • N Natural_Bugger

                          @Please_Help_me_D

                          this might help you out even more.

                          https://stackoverflow.com/questions/7749066/how-to-catch-out-of-memory-exception-in-c

                          https://stackoverflow.com/questions/23587837/c-allocating-large-array-on-heap-gives-out-of-memory-exception

                          Please_Help_me_DP Offline
                          Please_Help_me_DP Offline
                          Please_Help_me_D
                          wrote on last edited by
                          #57

                          @Natural_Bugger thank you for help!
                          My data is a binary files from few Gb to hundreds Gb. I read it and write it by portions. I also use OpenMP to read it in parallel.
                          Now I solved the my task and I'm busy with other task for now but I think a little bit later I will try your solution

                          N 1 Reply Last reply
                          1
                          • Please_Help_me_DP Please_Help_me_D

                            @Natural_Bugger thank you for help!
                            My data is a binary files from few Gb to hundreds Gb. I read it and write it by portions. I also use OpenMP to read it in parallel.
                            Now I solved the my task and I'm busy with other task for now but I think a little bit later I will try your solution

                            N Offline
                            N Offline
                            Natural_Bugger
                            wrote on last edited by
                            #58

                            @Please_Help_me_D

                            you're welcome.

                            i would like to know how you lines of data looks like.

                            regards

                            1 Reply Last reply
                            0

                            • Login

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • Users
                            • Groups
                            • Search
                            • Get Qt Extensions
                            • Unsolved