Using QHash/QMap for large data storage



  • I am trying to use QHash for storing approximately 30 Millions of unique data items. Here the data is string, which will have 2000 characters. This will result in using 4000 bytes of data per item. So, approximately the total data will be (30Millions * 4000) bytes and this value is very huge.

    When i try to insert the item one by one, the memory level of the particular process in the task manager increases rapidly, and once if it reaches around 2GB, the process get crash due to the memory shortage.

    Is there any solution in handling the huge amount of data with the QMap/QHash.



  • You'll need to be a bit smarter with your memory, I think.
    First of all, don't use QMap for this if you can help it. QHash is more efficient.
    Then, considder storing QByteArray instead of QString by using QString::toUtf8. Depending on your input data, that might result in a 50% space saving for your strings.

    There might be other optimizations to make. Are your strings unique, or are there many the same? In the latter case, make sure you only store a copy then and not an generated original that is the same as another string. Also: what do you use for keys?

    If your usage is still too big, condidder if you really need to keep all that data in memory at the same time. If not, you might be able to use custom memory structures of which you can page out the bulk to disk while it's not needed. There is no support in Qt for that though.



  • You may want to consider using an embedded database to automatically take care of paging relevant data in and out of the system.



  • [quote author="JothiMurugeswaran" date="1348841619"]and once if it reaches around 2GB, the process get crash due to the memory shortage.[/quote]

    Do you compile/run your application as a 32-Bit binary/process?

    If so, it is no surprise that memory allocation fails, as soon as you have allocated ~2 GB of memory. That's because the virtual memory space of a 32-Bit process is only 4 GB. And of this 4 GB memory space, only 2 GB are (usually) available to the application. This is not a limitation of Qt or QHash/QMap, but a limitation of the (32-Bit) x86 architecture and/or the underlying operating system...

    Solution: Either go for 64-Bit or store extremely large data structures on a secondary storage (i.e. HDD or SSD).



  • 30 Million * 4000 bytes is approximately 112 Gigabytes, not counting any overhead. No chance to do this in Memory of a private computer any time soon.



  • the best way to deal with a huge quantity of data is to use databases. A DBMS like PostgreSQL will help you to manage correctly the data and it will optimize the memory space and will speed-up data research.
    In your case you will use those types of data : http://www.postgresql.org/docs/8.4/static/datatype-character.html



  • Before you dig into optimizations: Is your design ideal? What are those 4k bytes and why 30 million of them?



  • I see someone advice you to use QHash because it is more efficient, but I think, in many case, that is not true.
    QMap maybe faster than QHash if the key is quite long or you want to save your space (because QHash needs more space than actual number of elements, it's waste).
    See this link for details: https://kenaware.wordpress.com/2015/06/28/hash-and-map-when-the-faster-becomes-the-slower/


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.