Unsolved Which Container is best suited for holding 10 millions of records (String, String) ?
-
I probably used it Map
std::map<T, pair<T,T> > mp;but
@vpn9507 said in Which Container is best suited for holding 10 millions of records (String, String) ?:
why don't you use sql ? its not a good idea to save too much data in memory
Hes Right
-
One more consideration for not storing it all in memory: let's say each string is about 100 characters, A QString needs 2 bytes for each character and you have two of them for an element. A very rough estimate: 10 000 000 * ((100 *2 ) * 2) = 4Gb of data and that's not counting the weight of the structure itself (i.e. all the pointers the map needs to create its nodes) and all the other memory your app needs. If you use that the OS is gonna swap your memory like there's no tomorrow causing it to slow down significantly.
-
Hi
It would be good to know
what you need 10 millions of records for ? -
@vpn9507 The response time must be in 5 milli seconds. Using sql, response time would get degraded. And deployment of SQL is not recommended for our application.
-
@Chris-Kawa Could you please let me know how did you evaluate the memory "A very rough estimate: 10 000 000 * ((100 *2 ) * 2) = 4Gb " ; I did not get why did you multiply with 2 at the end.
-
@ksranjith786
As for container classes there is quite handy overview of them here. Actual choice depends on many factors, how the data would be accessed and managed etc. however it is your choice.
As for "deployment of SQL" you actually do not have to deploy anything besides your apps running environment, SQLite is well integrated.
As for the actual storage of that amount of data... You can choose to have it all in the memory using container classes - assuming you have enough of the memory ofc., then you can put it in the memory based SQLite db or you can use SQLite with regular file based db, write your self a model that will handle the data the way you need them.
Well written model would be able to cache some of the data in the memory making it available instantly performing some cache optimization in the background (read in advance/write when idle).
I do not know your use case though so it is just a speculation. -
I think that part is trowing you off:
(100* 2) = 100 character as 2 byte
Therefor the mulitplication at the end is because of 2 strings per entry.@Chris-Kawa said in Which Container is best suited for holding 10 millions of records (String, String) ?:
let's say each string is about 100 characters, A QString needs 2 bytes for each character
-
10 000 000 entries, 100 Strings of 2 Pairs and 2 bytes memory each
-
@ksranjith786
As per the link http://doc.qt.io/qt-5/containers.html#algorithmic-complexityThe values above may seem a bit strange, but here are the guiding principles:
• QString allocates 4 characters at a time until it reaches size 20.
• From 20 to 4084, it advances by doubling the size each time. More precisely, it advances to the next power of two, minus 12. (Some memory allocators perform worst when requested exact powers of two, because they use a few bytes per block for book-keeping.)
• From 4084 on, it advances by blocks of 2048 characters (4096 bytes). This makes sense because modern operating systems don't copy the entire data when reallocating a buffer; the physical memory pages are simply reordered, and only the data on the first and last pages actually needs to be copied. -
WOW 4 responses within a minute. Never saw this before.
-
This post is deleted! -
@ksranjith786
Hi
You should really test with SQlite if your use case is to select amount 10 millions lines and display the subset. -
@ksranjith786 said in Which Container is best suited for holding 10 millions of records (String, String) ?:
Around 5 msec for lookup
You probably need a proper database (not just SQLite but maybe PostgreSQL), index it properly and maybe even give it standalone-resources (i.e. run it on a dedicated machine) to achieve that performance
-
@VRonin said in Which Container is best suited for holding 10 millions of records (String, String) ?:
You probably need a proper database (not just SQLite but maybe PostgreSQL), index it properly and maybe even give it standalone-resources (i.e. run it on a dedicated machine) to achieve that performance
Even this may not be viable. In a typical 10/100 network you'd get about 1ms of latency from the TCP/IP go-around, which shrinks that 5ms window considerably.
@ksranjith786
How are you going to use that dataset?Our use case is that, our application need to fetch offers associated for an item during item scan.
Elaborate on that, break it step by step for us and do say what are "offers" and "items" in this context, and most importantly what's an "item scan".
-
My db in MySQL with 3000 of records, takes at least 1.5 seconds to response.
-
That's simply too long. You should inspect your database and how you use it.
@VRonin
It just occurred to me that this problem is a prime candidate for usage and testing of your big hash lib. :) -
@kshegunov Lol, thanks but 5msec is not achievable even in my wildest dreams. I also suspect 10m QString as key (that are not dumped on the hard drive, only values are) are enough to blow most memory
-
oh, I think Redis may help you!
-
@joeQ You need to have spent quite a bit of money on RAM for your PC for that to be an option