[C++] Performance travel massive object field data
-
Hi guys,
I'm facing with bad performance problem.
I have a list with ~100k object, each object have ~100 field (each object have differenct field name / properties),
The task i need to do is travel each field of each object to detect object's reference.
Is there any solution for the best performance without any database?My current works is define object structure with field and field data type in ini file,
Use pointer and offset to retrieve object's field value.
Its will take 4-5 minutes, too slow. -
Hi,
Where are you loading your data from ?
What do you mean by travel ? -
Hi @SGaist ,
Thanks for your response,
My data loaded from binary file.
I load data all data by fread, it looks like thisint count = 100000; // Count number of elements in array vector<T> _v; _v.reserve(s); T data; // Each array have different structure T unsigned int i; for(i = 0; i < count; i++) { if(fread(&data, sizeof(T), 1, file) != 1) return -1; _v.push_back(data); }
I have around ~200 array of structure need to load, so im define template T for dynamic load it into my app.
The travel i mentioned means:- iterate each array of struture (~500 array)
- then, iterate each element in array (each array have arround 100k elements with different structure define)
- then, iterate each field in element object to check/search some condition (each element object have around 100 field)
For example: i need to search string "apple" in all array to findout where it appears in, which array, which elements, which index of array.
-
Hi @SGaist ,
Thanks for your response,
My data loaded from binary file.
I load data all data by fread, it looks like thisint count = 100000; // Count number of elements in array vector<T> _v; _v.reserve(s); T data; // Each array have different structure T unsigned int i; for(i = 0; i < count; i++) { if(fread(&data, sizeof(T), 1, file) != 1) return -1; _v.push_back(data); }
I have around ~200 array of structure need to load, so im define template T for dynamic load it into my app.
The travel i mentioned means:- iterate each array of struture (~500 array)
- then, iterate each element in array (each array have arround 100k elements with different structure define)
- then, iterate each field in element object to check/search some condition (each element object have around 100 field)
For example: i need to search string "apple" in all array to findout where it appears in, which array, which elements, which index of array.
@Sabrac
Well, if you don't want to use a database then you are going to read into memory and search? 500 * 100,000 * 100 fields * whatever size of fields is is... quite a big number. What is the total data size on disk? It will "take a while", though "4-5 minutes" still sounds high to me. Depends what your actual code is. Do ensure you do not read data from file more than once. Other than that, to find an arbitrary string anywhere will indeed involve search all bytes of data. Profile your application to see where it's spending most time. -
@Sabrac
Well, if you don't want to use a database then you are going to read into memory and search? 500 * 100,000 * 100 fields * whatever size of fields is is... quite a big number. What is the total data size on disk? It will "take a while", though "4-5 minutes" still sounds high to me. Depends what your actual code is. Do ensure you do not read data from file more than once. Other than that, to find an arbitrary string anywhere will indeed involve search all bytes of data. Profile your application to see where it's spending most time.@JonB
Just around 500MB on disk, and ofsource ~500MB after load into RAM.
I have logged process time, each loop of field take ~5 ms (debug mode may slower).
And as your calc, 500 * 100,000 * 100 * 5 ~ 25,000,000 seconds xD
Ofcourse this is unacceptable number.
My working solution is define all structure into QSettings (field name and field data type),
after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
And just loop for each element in array and each array. -
@JonB
Just around 500MB on disk, and ofsource ~500MB after load into RAM.
I have logged process time, each loop of field take ~5 ms (debug mode may slower).
And as your calc, 500 * 100,000 * 100 * 5 ~ 25,000,000 seconds xD
Ofcourse this is unacceptable number.
My working solution is define all structure into QSettings (field name and field data type),
after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
And just loop for each element in array and each array. -
@JonB
Just around 500MB on disk, and ofsource ~500MB after load into RAM.
I have logged process time, each loop of field take ~5 ms (debug mode may slower).
And as your calc, 500 * 100,000 * 100 * 5 ~ 25,000,000 seconds xD
Ofcourse this is unacceptable number.
My working solution is define all structure into QSettings (field name and field data type),
after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
And just loop for each element in array and each array.@Sabrac said in [C++] Performance travel massive object field data:
@JonB
Just around 500MB on disk, and ofsource ~500MB after load into RAM.
I have logged process time, each loop of field take ~5 ms (debug mode may slower).
And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
Ofcourse this is unacceptable number.
My working solution is define all structure into QSettings (field name and field data type),
after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
And just loop for each element in array and each array.Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.
-
@Sabrac said in [C++] Performance travel massive object field data:
@JonB
Just around 500MB on disk, and ofsource ~500MB after load into RAM.
I have logged process time, each loop of field take ~5 ms (debug mode may slower).
And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
Ofcourse this is unacceptable number.
My working solution is define all structure into QSettings (field name and field data type),
after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
And just loop for each element in array and each array.Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.
@DerReisende said in [C++] Performance travel massive object field data:
@Sabrac said in [C++] Performance travel massive object field data:
@JonB
Just around 500MB on disk, and ofsource ~500MB after load into RAM.
I have logged process time, each loop of field take ~5 ms (debug mode may slower).
And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
Ofcourse this is unacceptable number.
My working solution is define all structure into QSettings (field name and field data type),
after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
And just loop for each element in array and each array.Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.
QSettings store object structure define, cause i dont know type of array's element while iterate into, so i think the prob is iterate solution, QSettings maybe one of prob, but lower
-
@DerReisende said in [C++] Performance travel massive object field data:
@Sabrac said in [C++] Performance travel massive object field data:
@JonB
Just around 500MB on disk, and ofsource ~500MB after load into RAM.
I have logged process time, each loop of field take ~5 ms (debug mode may slower).
And as your calc, 500100,000100*5 ~ 25,000,000 seconds xD
Ofcourse this is unacceptable number.
My working solution is define all structure into QSettings (field name and field data type),
after that a get the pointer point to 1 element in array then get the object field value by that pointer and field offset (calculate by field data type)
And just loop for each element in array and each array.Why are you using QSettings? I dont think this is a high performance map implementation and it wasnt ever meant for that. I would use either a qt map or c++ map or if you need really more performant map tsl::hopscotch_map.
QSettings store object structure define, cause i dont know type of array's element while iterate into, so i think the prob is iterate solution, QSettings maybe one of prob, but lower
@Sabrac
The first thing that comes in mind is to create a hash table (QHash) with all the fields.
Could take some time.
Mimic database behaviour with less efficiency and facility.Hence my final toughs, create a sqlite lightweight database instead.
-
@Sabrac
The first thing that comes in mind is to create a hash table (QHash) with all the fields.
Could take some time.
Mimic database behaviour with less efficiency and facility.Hence my final toughs, create a sqlite lightweight database instead.
@mpergand said in [C++] Performance travel massive object field data:
@Sabrac
The first thing that comes in mind is to create a hash table (QHash) with all the fields.
Could take some time.
Mimic database behaviour with less efficiency and facility.Hence my final toughs, create a sqlite lightweight database instead.
I also think about sqlite, i have tried, but flush ~50 million record (each record ~100 field) into sqlite also take time.
Even if it acceptable, i will face to another problem: synchronize binary data with sqlite database record when i start my application.
Pain too. -
Does that file have a defined format like hdf5 ?
-
@mpergand said in [C++] Performance travel massive object field data:
@Sabrac
The first thing that comes in mind is to create a hash table (QHash) with all the fields.
Could take some time.
Mimic database behaviour with less efficiency and facility.Hence my final toughs, create a sqlite lightweight database instead.
I also think about sqlite, i have tried, but flush ~50 million record (each record ~100 field) into sqlite also take time.
Even if it acceptable, i will face to another problem: synchronize binary data with sqlite database record when i start my application.
Pain too.@Sabrac You can use sqlite with in-memory tables and disable fsync etc and it will be a lot faster. Obviously it does not store data on disk then.
-
Hi @SGaist ,
Thanks for your response,
My data loaded from binary file.
I load data all data by fread, it looks like thisint count = 100000; // Count number of elements in array vector<T> _v; _v.reserve(s); T data; // Each array have different structure T unsigned int i; for(i = 0; i < count; i++) { if(fread(&data, sizeof(T), 1, file) != 1) return -1; _v.push_back(data); }
I have around ~200 array of structure need to load, so im define template T for dynamic load it into my app.
The travel i mentioned means:- iterate each array of struture (~500 array)
- then, iterate each element in array (each array have arround 100k elements with different structure define)
- then, iterate each field in element object to check/search some condition (each element object have around 100 field)
For example: i need to search string "apple" in all array to findout where it appears in, which array, which elements, which index of array.
@Sabrac said in [C++] Performance travel massive object field data:
I load data all data by fread, it looks like this
int count = 100000; // Count number of elements in array vector<T> _v; _v.reserve(s); T data; // Each array have different structure T unsigned int i; for(i = 0; i < count; i++) { if(fread(&data, sizeof(T), 1, file) != 1) return -1; _v.push_back(data); }
Has reading all of the data in one call, eg
fread(*dest, sizeof(T), count, file)
been tried? This is 100,000 library calls that may turn into a similar number of system calls. -
@SGaist said in [C++] Performance travel massive object field data:
Does that file have a defined format like hdf5 ?
I'm not sure, doesn't work with hdf5 before, i just know elements structure, load order of each array.
@jeremy_k said in [C++] Performance travel massive object field data:
@Sabrac said in [C++] Performance travel massive object field data:
I load data all data by fread, it looks like this
int count = 100000; // Count number of elements in array vector<T> _v; _v.reserve(s); T data; // Each array have different structure T unsigned int i; for(i = 0; i < count; i++) { if(fread(&data, sizeof(T), 1, file) != 1) return -1; _v.push_back(data); }
Has reading all of the data in one call, eg
fread(*dest, sizeof(T), count, file)
been tried? This is 100,000 library calls that may turn into a similar number of system calls.The load behavior just cost ~400 msec -> totally fine, but iterate each field inside for check some condition make me headache
-
@SGaist said in [C++] Performance travel massive object field data:
Does that file have a defined format like hdf5 ?
I'm not sure, doesn't work with hdf5 before, i just know elements structure, load order of each array.
@jeremy_k said in [C++] Performance travel massive object field data:
@Sabrac said in [C++] Performance travel massive object field data:
I load data all data by fread, it looks like this
int count = 100000; // Count number of elements in array vector<T> _v; _v.reserve(s); T data; // Each array have different structure T unsigned int i; for(i = 0; i < count; i++) { if(fread(&data, sizeof(T), 1, file) != 1) return -1; _v.push_back(data); }
Has reading all of the data in one call, eg
fread(*dest, sizeof(T), count, file)
been tried? This is 100,000 library calls that may turn into a similar number of system calls.The load behavior just cost ~400 msec -> totally fine, but iterate each field inside for check some condition make me headache
@jeremy_k said in [C++] Performance travel massive object field data:
@Sabrac said in [C++] Performance travel massive object field data:
I load data all data by fread, it looks like this
int count = 100000; // Count number of elements in array vector<T> _v; _v.reserve(s); T data; // Each array have different structure T unsigned int i; for(i = 0; i < count; i++) { if(fread(&data, sizeof(T), 1, file) != 1) return -1; _v.push_back(data); }
Has reading all of the data in one call, eg
fread(*dest, sizeof(T), count, file)
been tried? This is 100,000 library calls that may turn into a similar number of system calls.The load behavior just cost ~400 msec -> totally fine, but iterate each field inside for check some condition make me headache
It helps to share the code that doesn't work as desired, rather than parts that are not a problem.
-
I will make this observation: if speed is your ultimate goal, as it appears to be, and if you are needing to search for an arbitrary string anywhere inside any/a particular field so indexing is not of use, then I cannot see how a database is likely to be anything but slower than code you can write for in-memory (or maybe direct-disk) access.
but iterate each field inside for check some condition make me headache
And what sort of "condition check" is this? Or is that just the string search?
BTW: once you have done whatever in the way of reading the data, will you do multiple searches in it or literally just one for (say) a particular string?
-
Thanks everyone,
After couple days research, i have tunned my code and boost iterate speed from ~12 mins to ~2s.In my condition check statement have 1 line of code that will scan whole array again, so in basically, it will scan all item in array square of array size <= pain point
-
Hi guys,
I'm facing with bad performance problem.
I have a list with ~100k object, each object have ~100 field (each object have differenct field name / properties),
The task i need to do is travel each field of each object to detect object's reference.
Is there any solution for the best performance without any database?My current works is define object structure with field and field data type in ini file,
Use pointer and offset to retrieve object's field value.
Its will take 4-5 minutes, too slow.@Sabrac said in [C++] Performance travel massive object field data:
I'm facing with bad performance problem.
I have a list with ~100k object, each object have ~100 field (each object have differenct field name / properties),
The task i need to do is travel each field of each object to detect object's reference.
Is there any solution for the best performance without any database?Navigating such a vast list of objects and fields can indeed present performance challenges. Considering the scale, optimizing this task without a database is intricate. However, when seeking solutions that are both efficient and satisfying, much like finding the best breakfast tacos in San Antonio, specialized algorithms or data structures tailored for traversal tasks might offer better performance. Strategies like hash maps or tree-based structures can expedite reference detection across numerous fields within your objects. It's a bit like exploring new eateries—finding the right flavor and combination is key to an enjoyable experience!