Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. [C++] Performance travel massive object field data
Forum Updated to NodeBB v4.3 + New Features

[C++] Performance travel massive object field data

Scheduled Pinned Locked Moved Solved General and Desktop
18 Posts 7 Posters 1.3k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • SabracS Offline
    SabracS Offline
    Sabrac
    wrote on last edited by
    #3

    Hi @SGaist ,
    Thanks for your response,
    My data loaded from binary file.
    I load data all data by fread, it looks like this

    int count = 100000; // Count number of elements in array
    vector<T> _v;
    _v.reserve(s);                                            
    T data; // Each array have different structure T
    unsigned int i;                                           
    for(i = 0; i < count; i++)                                        
    {                                                         
        if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
        _v.push_back(data);                                   
    }                                                         
    

    I have around ~200 array of structure need to load, so im define template T for dynamic load it into my app.
    The travel i mentioned means:

    • iterate each array of struture (~500 array)
    • then, iterate each element in array (each array have arround 100k elements with different structure define)
    • then, iterate each field in element object to check/search some condition (each element object have around 100 field)

    For example: i need to search string "apple" in all array to findout where it appears in, which array, which elements, which index of array.

    JonBJ jeremy_kJ 2 Replies Last reply
    0
    • SabracS Sabrac

      Hi @SGaist ,
      Thanks for your response,
      My data loaded from binary file.
      I load data all data by fread, it looks like this

      int count = 100000; // Count number of elements in array
      vector<T> _v;
      _v.reserve(s);                                            
      T data; // Each array have different structure T
      unsigned int i;                                           
      for(i = 0; i < count; i++)                                        
      {                                                         
          if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
          _v.push_back(data);                                   
      }                                                         
      

      I have around ~200 array of structure need to load, so im define template T for dynamic load it into my app.
      The travel i mentioned means:

      • iterate each array of struture (~500 array)
      • then, iterate each element in array (each array have arround 100k elements with different structure define)
      • then, iterate each field in element object to check/search some condition (each element object have around 100 field)

      For example: i need to search string "apple" in all array to findout where it appears in, which array, which elements, which index of array.

      JonBJ Offline
      JonBJ Offline
      JonB
      wrote on last edited by
      #4

      @Sabrac
      Well, if you don't want to use a database then you are going to read into memory and search? 500 * 100,000 * 100 fields * whatever size of fields is is... quite a big number. What is the total data size on disk? It will "take a while", though "4-5 minutes" still sounds high to me. Depends what your actual code is. Do ensure you do not read data from file more than once. Other than that, to find an arbitrary string anywhere will indeed involve search all bytes of data. Profile your application to see where it's spending most time.

      SabracS 1 Reply Last reply
      3
      • JonBJ JonB

        @Sabrac
        Well, if you don't want to use a database then you are going to read into memory and search? 500 * 100,000 * 100 fields * whatever size of fields is is... quite a big number. What is the total data size on disk? It will "take a while", though "4-5 minutes" still sounds high to me. Depends what your actual code is. Do ensure you do not read data from file more than once. Other than that, to find an arbitrary string anywhere will indeed involve search all bytes of data. Profile your application to see where it's spending most time.

        SabracS Offline
        SabracS Offline
        Sabrac
        wrote on last edited by Sabrac
        #5

        @JonB
        Just around 500MB on disk, and ofsource ~500MB after load into RAM.
        I have logged process time, each loop of field take ~5 ms (debug mode may slower).
        And as your calc, 500 * 100,000 * 100 * 5 ~ 25,000,000 seconds xD
        Ofcourse this is unacceptable number.
        My working solution is define all structure into QSettings (field name and field data type),
        after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
        And just loop for each element in array and each array.

        JonBJ D 2 Replies Last reply
        0
        • SabracS Sabrac

          @JonB
          Just around 500MB on disk, and ofsource ~500MB after load into RAM.
          I have logged process time, each loop of field take ~5 ms (debug mode may slower).
          And as your calc, 500 * 100,000 * 100 * 5 ~ 25,000,000 seconds xD
          Ofcourse this is unacceptable number.
          My working solution is define all structure into QSettings (field name and field data type),
          after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
          And just loop for each element in array and each array.

          JonBJ Offline
          JonBJ Offline
          JonB
          wrote on last edited by
          #6

          @Sabrac
          My feeling (untested) is that 0.5GB should not take "4-5 minutes" to search.

          1 Reply Last reply
          1
          • SabracS Sabrac

            @JonB
            Just around 500MB on disk, and ofsource ~500MB after load into RAM.
            I have logged process time, each loop of field take ~5 ms (debug mode may slower).
            And as your calc, 500 * 100,000 * 100 * 5 ~ 25,000,000 seconds xD
            Ofcourse this is unacceptable number.
            My working solution is define all structure into QSettings (field name and field data type),
            after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
            And just loop for each element in array and each array.

            D Offline
            D Offline
            DerReisende
            wrote on last edited by
            #7

            @Sabrac said in [C++] Performance travel massive object field data:

            @JonB
            Just around 500MB on disk, and ofsource ~500MB after load into RAM.
            I have logged process time, each loop of field take ~5 ms (debug mode may slower).
            And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
            Ofcourse this is unacceptable number.
            My working solution is define all structure into QSettings (field name and field data type),
            after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
            And just loop for each element in array and each array.

            Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.

            SabracS 1 Reply Last reply
            1
            • D DerReisende

              @Sabrac said in [C++] Performance travel massive object field data:

              @JonB
              Just around 500MB on disk, and ofsource ~500MB after load into RAM.
              I have logged process time, each loop of field take ~5 ms (debug mode may slower).
              And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
              Ofcourse this is unacceptable number.
              My working solution is define all structure into QSettings (field name and field data type),
              after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
              And just loop for each element in array and each array.

              Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.

              SabracS Offline
              SabracS Offline
              Sabrac
              wrote on last edited by
              #8

              @DerReisende said in [C++] Performance travel massive object field data:

              @Sabrac said in [C++] Performance travel massive object field data:

              @JonB
              Just around 500MB on disk, and ofsource ~500MB after load into RAM.
              I have logged process time, each loop of field take ~5 ms (debug mode may slower).
              And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
              Ofcourse this is unacceptable number.
              My working solution is define all structure into QSettings (field name and field data type),
              after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
              And just loop for each element in array and each array.

              Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.

              QSettings store object structure define, cause i dont know type of array's element while iterate into, so i think the prob is iterate solution, QSettings maybe one of prob, but lower

              M 1 Reply Last reply
              0
              • SabracS Sabrac

                @DerReisende said in [C++] Performance travel massive object field data:

                @Sabrac said in [C++] Performance travel massive object field data:

                @JonB
                Just around 500MB on disk, and ofsource ~500MB after load into RAM.
                I have logged process time, each loop of field take ~5 ms (debug mode may slower).
                And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
                Ofcourse this is unacceptable number.
                My working solution is define all structure into QSettings (field name and field data type),
                after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
                And just loop for each element in array and each array.

                Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.

                QSettings store object structure define, cause i dont know type of array's element while iterate into, so i think the prob is iterate solution, QSettings maybe one of prob, but lower

                M Offline
                M Offline
                mpergand
                wrote on last edited by mpergand
                #9

                @Sabrac
                The first thing that comes in mind is to create a hash table (QHash) with all the fields.
                Could take some time.
                Mimic database behaviour with less efficiency and facility.

                Hence my final toughs, create a sqlite lightweight database instead.

                SabracS 1 Reply Last reply
                0
                • M mpergand

                  @Sabrac
                  The first thing that comes in mind is to create a hash table (QHash) with all the fields.
                  Could take some time.
                  Mimic database behaviour with less efficiency and facility.

                  Hence my final toughs, create a sqlite lightweight database instead.

                  SabracS Offline
                  SabracS Offline
                  Sabrac
                  wrote on last edited by
                  #10

                  @mpergand said in [C++] Performance travel massive object field data:

                  @Sabrac
                  The first thing that comes in mind is to create a hash table (QHash) with all the fields.
                  Could take some time.
                  Mimic database behaviour with less efficiency and facility.

                  Hence my final toughs, create a sqlite lightweight database instead.

                  I also think about sqlite, i have tried, but flush ~50 million record (each record ~100 field) into sqlite also take time.
                  Even if it acceptable, i will face to another problem: synchronize binary data with sqlite database record when i start my application.
                  Pain too.

                  D 1 Reply Last reply
                  0
                  • SGaistS Offline
                    SGaistS Offline
                    SGaist
                    Lifetime Qt Champion
                    wrote on last edited by
                    #11

                    Does that file have a defined format like hdf5 ?

                    Interested in AI ? www.idiap.ch
                    Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                    SabracS 1 Reply Last reply
                    0
                    • SabracS Sabrac

                      @mpergand said in [C++] Performance travel massive object field data:

                      @Sabrac
                      The first thing that comes in mind is to create a hash table (QHash) with all the fields.
                      Could take some time.
                      Mimic database behaviour with less efficiency and facility.

                      Hence my final toughs, create a sqlite lightweight database instead.

                      I also think about sqlite, i have tried, but flush ~50 million record (each record ~100 field) into sqlite also take time.
                      Even if it acceptable, i will face to another problem: synchronize binary data with sqlite database record when i start my application.
                      Pain too.

                      D Offline
                      D Offline
                      DerReisende
                      wrote on last edited by
                      #12

                      @Sabrac You can use sqlite with in-memory tables and disable fsync etc and it will be a lot faster. Obviously it does not store data on disk then.

                      1 Reply Last reply
                      1
                      • SabracS Sabrac

                        Hi @SGaist ,
                        Thanks for your response,
                        My data loaded from binary file.
                        I load data all data by fread, it looks like this

                        int count = 100000; // Count number of elements in array
                        vector<T> _v;
                        _v.reserve(s);                                            
                        T data; // Each array have different structure T
                        unsigned int i;                                           
                        for(i = 0; i < count; i++)                                        
                        {                                                         
                            if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
                            _v.push_back(data);                                   
                        }                                                         
                        

                        I have around ~200 array of structure need to load, so im define template T for dynamic load it into my app.
                        The travel i mentioned means:

                        • iterate each array of struture (~500 array)
                        • then, iterate each element in array (each array have arround 100k elements with different structure define)
                        • then, iterate each field in element object to check/search some condition (each element object have around 100 field)

                        For example: i need to search string "apple" in all array to findout where it appears in, which array, which elements, which index of array.

                        jeremy_kJ Offline
                        jeremy_kJ Offline
                        jeremy_k
                        wrote on last edited by
                        #13

                        @Sabrac said in [C++] Performance travel massive object field data:

                        I load data all data by fread, it looks like this

                        int count = 100000; // Count number of elements in array
                        vector<T> _v;
                        _v.reserve(s);                                            
                        T data; // Each array have different structure T
                        unsigned int i;                                           
                        for(i = 0; i < count; i++)                                        
                        {                                                         
                            if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
                            _v.push_back(data);                                   
                        }                                                         
                        

                        Has reading all of the data in one call, eg fread(*dest, sizeof(T), count, file) been tried? This is 100,000 library calls that may turn into a similar number of system calls.

                        Asking a question about code? http://eel.is/iso-c++/testcase/

                        1 Reply Last reply
                        0
                        • SGaistS SGaist

                          Does that file have a defined format like hdf5 ?

                          SabracS Offline
                          SabracS Offline
                          Sabrac
                          wrote on last edited by
                          #14

                          @SGaist said in [C++] Performance travel massive object field data:

                          Does that file have a defined format like hdf5 ?

                          I'm not sure, doesn't work with hdf5 before, i just know elements structure, load order of each array.

                          @jeremy_k said in [C++] Performance travel massive object field data:

                          @Sabrac said in [C++] Performance travel massive object field data:

                          I load data all data by fread, it looks like this

                          int count = 100000; // Count number of elements in array
                          vector<T> _v;
                          _v.reserve(s);                                            
                          T data; // Each array have different structure T
                          unsigned int i;                                           
                          for(i = 0; i < count; i++)                                        
                          {                                                         
                              if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
                              _v.push_back(data);                                   
                          }                                                         
                          

                          Has reading all of the data in one call, eg fread(*dest, sizeof(T), count, file) been tried? This is 100,000 library calls that may turn into a similar number of system calls.

                          The load behavior just cost ~400 msec -> totally fine, but iterate each field inside for check some condition make me headache

                          jeremy_kJ 1 Reply Last reply
                          0
                          • SabracS Sabrac

                            @SGaist said in [C++] Performance travel massive object field data:

                            Does that file have a defined format like hdf5 ?

                            I'm not sure, doesn't work with hdf5 before, i just know elements structure, load order of each array.

                            @jeremy_k said in [C++] Performance travel massive object field data:

                            @Sabrac said in [C++] Performance travel massive object field data:

                            I load data all data by fread, it looks like this

                            int count = 100000; // Count number of elements in array
                            vector<T> _v;
                            _v.reserve(s);                                            
                            T data; // Each array have different structure T
                            unsigned int i;                                           
                            for(i = 0; i < count; i++)                                        
                            {                                                         
                                if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
                                _v.push_back(data);                                   
                            }                                                         
                            

                            Has reading all of the data in one call, eg fread(*dest, sizeof(T), count, file) been tried? This is 100,000 library calls that may turn into a similar number of system calls.

                            The load behavior just cost ~400 msec -> totally fine, but iterate each field inside for check some condition make me headache

                            jeremy_kJ Offline
                            jeremy_kJ Offline
                            jeremy_k
                            wrote on last edited by
                            #15

                            @jeremy_k said in [C++] Performance travel massive object field data:

                            @Sabrac said in [C++] Performance travel massive object field data:

                            I load data all data by fread, it looks like this

                            int count = 100000; // Count number of elements in array
                            vector<T> _v;
                            _v.reserve(s);                                            
                            T data; // Each array have different structure T
                            unsigned int i;                                           
                            for(i = 0; i < count; i++)                                        
                            {                                                         
                                if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
                                _v.push_back(data);                                   
                            }                                                         
                            

                            Has reading all of the data in one call, eg fread(*dest, sizeof(T), count, file) been tried? This is 100,000 library calls that may turn into a similar number of system calls.

                            The load behavior just cost ~400 msec -> totally fine, but iterate each field inside for check some condition make me headache

                            It helps to share the code that doesn't work as desired, rather than parts that are not a problem.

                            Asking a question about code? http://eel.is/iso-c++/testcase/

                            1 Reply Last reply
                            0
                            • JonBJ Offline
                              JonBJ Offline
                              JonB
                              wrote on last edited by
                              #16

                              I will make this observation: if speed is your ultimate goal, as it appears to be, and if you are needing to search for an arbitrary string anywhere inside any/a particular field so indexing is not of use, then I cannot see how a database is likely to be anything but slower than code you can write for in-memory (or maybe direct-disk) access.

                              but iterate each field inside for check some condition make me headache

                              And what sort of "condition check" is this? Or is that just the string search?

                              BTW: once you have done whatever in the way of reading the data, will you do multiple searches in it or literally just one for (say) a particular string?

                              1 Reply Last reply
                              0
                              • SabracS Offline
                                SabracS Offline
                                Sabrac
                                wrote on last edited by Sabrac
                                #17

                                Thanks everyone,
                                After couple days research, i have tunned my code and boost iterate speed from ~12 mins to ~2s.

                                In my condition check statement have 1 line of code that will scan whole array again, so in basically, it will scan all item in array square of array size <= pain point

                                1 Reply Last reply
                                1
                                • SabracS Sabrac

                                  Hi guys,
                                  I'm facing with bad performance problem.
                                  I have a list with ~100k object, each object have ~100 field (each object have differenct field name / properties),
                                  The task i need to do is travel each field of each object to detect object's reference.
                                  Is there any solution for the best performance without any database?

                                  My current works is define object structure with field and field data type in ini file,
                                  Use pointer and offset to retrieve object's field value.
                                  Its will take 4-5 minutes, too slow.

                                  J Offline
                                  J Offline
                                  josefromeo
                                  wrote on last edited by
                                  #18

                                  @Sabrac said in [C++] Performance travel massive object field data:

                                  I'm facing with bad performance problem.
                                  I have a list with ~100k object, each object have ~100 field (each object have differenct field name / properties),
                                  The task i need to do is travel each field of each object to detect object's reference.
                                  Is there any solution for the best performance without any database?

                                  Navigating such a vast list of objects and fields can indeed present performance challenges. Considering the scale, optimizing this task without a database is intricate. However, when seeking solutions that are both efficient and satisfying, much like finding the best breakfast tacos in San Antonio, specialized algorithms or data structures tailored for traversal tasks might offer better performance. Strategies like hash maps or tree-based structures can expedite reference detection across numerous fields within your objects. It's a bit like exploring new eateries—finding the right flavor and combination is key to an enjoyable experience!

                                  1 Reply Last reply
                                  0

                                  • Login

                                  • Login or register to search.
                                  • First post
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • Users
                                  • Groups
                                  • Search
                                  • Get Qt Extensions
                                  • Unsolved