Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. [C++] Performance travel massive object field data
Forum Updated to NodeBB v4.3 + New Features

[C++] Performance travel massive object field data

Scheduled Pinned Locked Moved Solved General and Desktop
18 Posts 7 Posters 1.3k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • SabracS Offline
    SabracS Offline
    Sabrac
    wrote on last edited by Sabrac
    #1

    Hi guys,
    I'm facing with bad performance problem.
    I have a list with ~100k object, each object have ~100 field (each object have differenct field name / properties),
    The task i need to do is travel each field of each object to detect object's reference.
    Is there any solution for the best performance without any database?

    My current works is define object structure with field and field data type in ini file,
    Use pointer and offset to retrieve object's field value.
    Its will take 4-5 minutes, too slow.

    J 1 Reply Last reply
    0
    • SGaistS Offline
      SGaistS Offline
      SGaist
      Lifetime Qt Champion
      wrote on last edited by
      #2

      Hi,

      Where are you loading your data from ?
      What do you mean by travel ?

      Interested in AI ? www.idiap.ch
      Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

      1 Reply Last reply
      2
      • SabracS Offline
        SabracS Offline
        Sabrac
        wrote on last edited by
        #3

        Hi @SGaist ,
        Thanks for your response,
        My data loaded from binary file.
        I load data all data by fread, it looks like this

        int count = 100000; // Count number of elements in array
        vector<T> _v;
        _v.reserve(s);                                            
        T data; // Each array have different structure T
        unsigned int i;                                           
        for(i = 0; i < count; i++)                                        
        {                                                         
            if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
            _v.push_back(data);                                   
        }                                                         
        

        I have around ~200 array of structure need to load, so im define template T for dynamic load it into my app.
        The travel i mentioned means:

        • iterate each array of struture (~500 array)
        • then, iterate each element in array (each array have arround 100k elements with different structure define)
        • then, iterate each field in element object to check/search some condition (each element object have around 100 field)

        For example: i need to search string "apple" in all array to findout where it appears in, which array, which elements, which index of array.

        JonBJ jeremy_kJ 2 Replies Last reply
        0
        • SabracS Sabrac

          Hi @SGaist ,
          Thanks for your response,
          My data loaded from binary file.
          I load data all data by fread, it looks like this

          int count = 100000; // Count number of elements in array
          vector<T> _v;
          _v.reserve(s);                                            
          T data; // Each array have different structure T
          unsigned int i;                                           
          for(i = 0; i < count; i++)                                        
          {                                                         
              if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
              _v.push_back(data);                                   
          }                                                         
          

          I have around ~200 array of structure need to load, so im define template T for dynamic load it into my app.
          The travel i mentioned means:

          • iterate each array of struture (~500 array)
          • then, iterate each element in array (each array have arround 100k elements with different structure define)
          • then, iterate each field in element object to check/search some condition (each element object have around 100 field)

          For example: i need to search string "apple" in all array to findout where it appears in, which array, which elements, which index of array.

          JonBJ Offline
          JonBJ Offline
          JonB
          wrote on last edited by
          #4

          @Sabrac
          Well, if you don't want to use a database then you are going to read into memory and search? 500 * 100,000 * 100 fields * whatever size of fields is is... quite a big number. What is the total data size on disk? It will "take a while", though "4-5 minutes" still sounds high to me. Depends what your actual code is. Do ensure you do not read data from file more than once. Other than that, to find an arbitrary string anywhere will indeed involve search all bytes of data. Profile your application to see where it's spending most time.

          SabracS 1 Reply Last reply
          3
          • JonBJ JonB

            @Sabrac
            Well, if you don't want to use a database then you are going to read into memory and search? 500 * 100,000 * 100 fields * whatever size of fields is is... quite a big number. What is the total data size on disk? It will "take a while", though "4-5 minutes" still sounds high to me. Depends what your actual code is. Do ensure you do not read data from file more than once. Other than that, to find an arbitrary string anywhere will indeed involve search all bytes of data. Profile your application to see where it's spending most time.

            SabracS Offline
            SabracS Offline
            Sabrac
            wrote on last edited by Sabrac
            #5

            @JonB
            Just around 500MB on disk, and ofsource ~500MB after load into RAM.
            I have logged process time, each loop of field take ~5 ms (debug mode may slower).
            And as your calc, 500 * 100,000 * 100 * 5 ~ 25,000,000 seconds xD
            Ofcourse this is unacceptable number.
            My working solution is define all structure into QSettings (field name and field data type),
            after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
            And just loop for each element in array and each array.

            JonBJ D 2 Replies Last reply
            0
            • SabracS Sabrac

              @JonB
              Just around 500MB on disk, and ofsource ~500MB after load into RAM.
              I have logged process time, each loop of field take ~5 ms (debug mode may slower).
              And as your calc, 500 * 100,000 * 100 * 5 ~ 25,000,000 seconds xD
              Ofcourse this is unacceptable number.
              My working solution is define all structure into QSettings (field name and field data type),
              after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
              And just loop for each element in array and each array.

              JonBJ Offline
              JonBJ Offline
              JonB
              wrote on last edited by
              #6

              @Sabrac
              My feeling (untested) is that 0.5GB should not take "4-5 minutes" to search.

              1 Reply Last reply
              1
              • SabracS Sabrac

                @JonB
                Just around 500MB on disk, and ofsource ~500MB after load into RAM.
                I have logged process time, each loop of field take ~5 ms (debug mode may slower).
                And as your calc, 500 * 100,000 * 100 * 5 ~ 25,000,000 seconds xD
                Ofcourse this is unacceptable number.
                My working solution is define all structure into QSettings (field name and field data type),
                after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
                And just loop for each element in array and each array.

                D Offline
                D Offline
                DerReisende
                wrote on last edited by
                #7

                @Sabrac said in [C++] Performance travel massive object field data:

                @JonB
                Just around 500MB on disk, and ofsource ~500MB after load into RAM.
                I have logged process time, each loop of field take ~5 ms (debug mode may slower).
                And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
                Ofcourse this is unacceptable number.
                My working solution is define all structure into QSettings (field name and field data type),
                after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
                And just loop for each element in array and each array.

                Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.

                SabracS 1 Reply Last reply
                1
                • D DerReisende

                  @Sabrac said in [C++] Performance travel massive object field data:

                  @JonB
                  Just around 500MB on disk, and ofsource ~500MB after load into RAM.
                  I have logged process time, each loop of field take ~5 ms (debug mode may slower).
                  And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
                  Ofcourse this is unacceptable number.
                  My working solution is define all structure into QSettings (field name and field data type),
                  after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
                  And just loop for each element in array and each array.

                  Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.

                  SabracS Offline
                  SabracS Offline
                  Sabrac
                  wrote on last edited by
                  #8

                  @DerReisende said in [C++] Performance travel massive object field data:

                  @Sabrac said in [C++] Performance travel massive object field data:

                  @JonB
                  Just around 500MB on disk, and ofsource ~500MB after load into RAM.
                  I have logged process time, each loop of field take ~5 ms (debug mode may slower).
                  And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
                  Ofcourse this is unacceptable number.
                  My working solution is define all structure into QSettings (field name and field data type),
                  after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
                  And just loop for each element in array and each array.

                  Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.

                  QSettings store object structure define, cause i dont know type of array's element while iterate into, so i think the prob is iterate solution, QSettings maybe one of prob, but lower

                  M 1 Reply Last reply
                  0
                  • SabracS Sabrac

                    @DerReisende said in [C++] Performance travel massive object field data:

                    @Sabrac said in [C++] Performance travel massive object field data:

                    @JonB
                    Just around 500MB on disk, and ofsource ~500MB after load into RAM.
                    I have logged process time, each loop of field take ~5 ms (debug mode may slower).
                    And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
                    Ofcourse this is unacceptable number.
                    My working solution is define all structure into QSettings (field name and field data type),
                    after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
                    And just loop for each element in array and each array.

                    Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.

                    QSettings store object structure define, cause i dont know type of array's element while iterate into, so i think the prob is iterate solution, QSettings maybe one of prob, but lower

                    M Offline
                    M Offline
                    mpergand
                    wrote on last edited by mpergand
                    #9

                    @Sabrac
                    The first thing that comes in mind is to create a hash table (QHash) with all the fields.
                    Could take some time.
                    Mimic database behaviour with less efficiency and facility.

                    Hence my final toughs, create a sqlite lightweight database instead.

                    SabracS 1 Reply Last reply
                    0
                    • M mpergand

                      @Sabrac
                      The first thing that comes in mind is to create a hash table (QHash) with all the fields.
                      Could take some time.
                      Mimic database behaviour with less efficiency and facility.

                      Hence my final toughs, create a sqlite lightweight database instead.

                      SabracS Offline
                      SabracS Offline
                      Sabrac
                      wrote on last edited by
                      #10

                      @mpergand said in [C++] Performance travel massive object field data:

                      @Sabrac
                      The first thing that comes in mind is to create a hash table (QHash) with all the fields.
                      Could take some time.
                      Mimic database behaviour with less efficiency and facility.

                      Hence my final toughs, create a sqlite lightweight database instead.

                      I also think about sqlite, i have tried, but flush ~50 million record (each record ~100 field) into sqlite also take time.
                      Even if it acceptable, i will face to another problem: synchronize binary data with sqlite database record when i start my application.
                      Pain too.

                      D 1 Reply Last reply
                      0
                      • SGaistS Offline
                        SGaistS Offline
                        SGaist
                        Lifetime Qt Champion
                        wrote on last edited by
                        #11

                        Does that file have a defined format like hdf5 ?

                        Interested in AI ? www.idiap.ch
                        Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                        SabracS 1 Reply Last reply
                        0
                        • SabracS Sabrac

                          @mpergand said in [C++] Performance travel massive object field data:

                          @Sabrac
                          The first thing that comes in mind is to create a hash table (QHash) with all the fields.
                          Could take some time.
                          Mimic database behaviour with less efficiency and facility.

                          Hence my final toughs, create a sqlite lightweight database instead.

                          I also think about sqlite, i have tried, but flush ~50 million record (each record ~100 field) into sqlite also take time.
                          Even if it acceptable, i will face to another problem: synchronize binary data with sqlite database record when i start my application.
                          Pain too.

                          D Offline
                          D Offline
                          DerReisende
                          wrote on last edited by
                          #12

                          @Sabrac You can use sqlite with in-memory tables and disable fsync etc and it will be a lot faster. Obviously it does not store data on disk then.

                          1 Reply Last reply
                          1
                          • SabracS Sabrac

                            Hi @SGaist ,
                            Thanks for your response,
                            My data loaded from binary file.
                            I load data all data by fread, it looks like this

                            int count = 100000; // Count number of elements in array
                            vector<T> _v;
                            _v.reserve(s);                                            
                            T data; // Each array have different structure T
                            unsigned int i;                                           
                            for(i = 0; i < count; i++)                                        
                            {                                                         
                                if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
                                _v.push_back(data);                                   
                            }                                                         
                            

                            I have around ~200 array of structure need to load, so im define template T for dynamic load it into my app.
                            The travel i mentioned means:

                            • iterate each array of struture (~500 array)
                            • then, iterate each element in array (each array have arround 100k elements with different structure define)
                            • then, iterate each field in element object to check/search some condition (each element object have around 100 field)

                            For example: i need to search string "apple" in all array to findout where it appears in, which array, which elements, which index of array.

                            jeremy_kJ Offline
                            jeremy_kJ Offline
                            jeremy_k
                            wrote on last edited by
                            #13

                            @Sabrac said in [C++] Performance travel massive object field data:

                            I load data all data by fread, it looks like this

                            int count = 100000; // Count number of elements in array
                            vector<T> _v;
                            _v.reserve(s);                                            
                            T data; // Each array have different structure T
                            unsigned int i;                                           
                            for(i = 0; i < count; i++)                                        
                            {                                                         
                                if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
                                _v.push_back(data);                                   
                            }                                                         
                            

                            Has reading all of the data in one call, eg fread(*dest, sizeof(T), count, file) been tried? This is 100,000 library calls that may turn into a similar number of system calls.

                            Asking a question about code? http://eel.is/iso-c++/testcase/

                            1 Reply Last reply
                            0
                            • SGaistS SGaist

                              Does that file have a defined format like hdf5 ?

                              SabracS Offline
                              SabracS Offline
                              Sabrac
                              wrote on last edited by
                              #14

                              @SGaist said in [C++] Performance travel massive object field data:

                              Does that file have a defined format like hdf5 ?

                              I'm not sure, doesn't work with hdf5 before, i just know elements structure, load order of each array.

                              @jeremy_k said in [C++] Performance travel massive object field data:

                              @Sabrac said in [C++] Performance travel massive object field data:

                              I load data all data by fread, it looks like this

                              int count = 100000; // Count number of elements in array
                              vector<T> _v;
                              _v.reserve(s);                                            
                              T data; // Each array have different structure T
                              unsigned int i;                                           
                              for(i = 0; i < count; i++)                                        
                              {                                                         
                                  if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
                                  _v.push_back(data);                                   
                              }                                                         
                              

                              Has reading all of the data in one call, eg fread(*dest, sizeof(T), count, file) been tried? This is 100,000 library calls that may turn into a similar number of system calls.

                              The load behavior just cost ~400 msec -> totally fine, but iterate each field inside for check some condition make me headache

                              jeremy_kJ 1 Reply Last reply
                              0
                              • SabracS Sabrac

                                @SGaist said in [C++] Performance travel massive object field data:

                                Does that file have a defined format like hdf5 ?

                                I'm not sure, doesn't work with hdf5 before, i just know elements structure, load order of each array.

                                @jeremy_k said in [C++] Performance travel massive object field data:

                                @Sabrac said in [C++] Performance travel massive object field data:

                                I load data all data by fread, it looks like this

                                int count = 100000; // Count number of elements in array
                                vector<T> _v;
                                _v.reserve(s);                                            
                                T data; // Each array have different structure T
                                unsigned int i;                                           
                                for(i = 0; i < count; i++)                                        
                                {                                                         
                                    if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
                                    _v.push_back(data);                                   
                                }                                                         
                                

                                Has reading all of the data in one call, eg fread(*dest, sizeof(T), count, file) been tried? This is 100,000 library calls that may turn into a similar number of system calls.

                                The load behavior just cost ~400 msec -> totally fine, but iterate each field inside for check some condition make me headache

                                jeremy_kJ Offline
                                jeremy_kJ Offline
                                jeremy_k
                                wrote on last edited by
                                #15

                                @jeremy_k said in [C++] Performance travel massive object field data:

                                @Sabrac said in [C++] Performance travel massive object field data:

                                I load data all data by fread, it looks like this

                                int count = 100000; // Count number of elements in array
                                vector<T> _v;
                                _v.reserve(s);                                            
                                T data; // Each array have different structure T
                                unsigned int i;                                           
                                for(i = 0; i < count; i++)                                        
                                {                                                         
                                    if(fread(&data, sizeof(T), 1, file) != 1) 	return -1;
                                    _v.push_back(data);                                   
                                }                                                         
                                

                                Has reading all of the data in one call, eg fread(*dest, sizeof(T), count, file) been tried? This is 100,000 library calls that may turn into a similar number of system calls.

                                The load behavior just cost ~400 msec -> totally fine, but iterate each field inside for check some condition make me headache

                                It helps to share the code that doesn't work as desired, rather than parts that are not a problem.

                                Asking a question about code? http://eel.is/iso-c++/testcase/

                                1 Reply Last reply
                                0
                                • JonBJ Offline
                                  JonBJ Offline
                                  JonB
                                  wrote on last edited by
                                  #16

                                  I will make this observation: if speed is your ultimate goal, as it appears to be, and if you are needing to search for an arbitrary string anywhere inside any/a particular field so indexing is not of use, then I cannot see how a database is likely to be anything but slower than code you can write for in-memory (or maybe direct-disk) access.

                                  but iterate each field inside for check some condition make me headache

                                  And what sort of "condition check" is this? Or is that just the string search?

                                  BTW: once you have done whatever in the way of reading the data, will you do multiple searches in it or literally just one for (say) a particular string?

                                  1 Reply Last reply
                                  0
                                  • SabracS Offline
                                    SabracS Offline
                                    Sabrac
                                    wrote on last edited by Sabrac
                                    #17

                                    Thanks everyone,
                                    After couple days research, i have tunned my code and boost iterate speed from ~12 mins to ~2s.

                                    In my condition check statement have 1 line of code that will scan whole array again, so in basically, it will scan all item in array square of array size <= pain point

                                    1 Reply Last reply
                                    1
                                    • SabracS Sabrac

                                      Hi guys,
                                      I'm facing with bad performance problem.
                                      I have a list with ~100k object, each object have ~100 field (each object have differenct field name / properties),
                                      The task i need to do is travel each field of each object to detect object's reference.
                                      Is there any solution for the best performance without any database?

                                      My current works is define object structure with field and field data type in ini file,
                                      Use pointer and offset to retrieve object's field value.
                                      Its will take 4-5 minutes, too slow.

                                      J Offline
                                      J Offline
                                      josefromeo
                                      wrote on last edited by
                                      #18

                                      @Sabrac said in [C++] Performance travel massive object field data:

                                      I'm facing with bad performance problem.
                                      I have a list with ~100k object, each object have ~100 field (each object have differenct field name / properties),
                                      The task i need to do is travel each field of each object to detect object's reference.
                                      Is there any solution for the best performance without any database?

                                      Navigating such a vast list of objects and fields can indeed present performance challenges. Considering the scale, optimizing this task without a database is intricate. However, when seeking solutions that are both efficient and satisfying, much like finding the best breakfast tacos in San Antonio, specialized algorithms or data structures tailored for traversal tasks might offer better performance. Strategies like hash maps or tree-based structures can expedite reference detection across numerous fields within your objects. It's a bit like exploring new eateries—finding the right flavor and combination is key to an enjoyable experience!

                                      1 Reply Last reply
                                      0

                                      • Login

                                      • Login or register to search.
                                      • First post
                                        Last post
                                      0
                                      • Categories
                                      • Recent
                                      • Tags
                                      • Popular
                                      • Users
                                      • Groups
                                      • Search
                                      • Get Qt Extensions
                                      • Unsolved