Unsolved QHash or QMultihash (specific performance based use case)?
-
If I understand the difference between these two correctly, the first one is meant for only one item with a specific hash value. For a string implementation, in order to arrive at that hash value, it must traverse the string to calculate the value before inserting it in the hash table. For each collision, it checks against the string to see if it has found the correct item (preventing an exact collision).
In my situation, I must deal with cross-library string classes that may potentially go into the same table. In order to standardize the method for storage/retrieval and to theoretically save on hashing time (performance is the key consideration here), I pre-calculate the hash value for the string using one of the qhash functions for each object and use that to store it in the hash table. This kills several performance problems with one stone (e.g. having to not only hash the string, but potentially having to copy and convert the string as well).
The problem with this method is that we no longer have the string (which IS a unique identifier) to make sure that the id is unique and that we have the correct item. This presents a problem when deleting and finding items because it is possible for two different strings to have the same hash value. The table will hold potentially tens of thousands if not hundreds of thousands of items (worst case scenario). In the average scenario, it will be holding at least thousands of items.
In this specific situation, which would likely give the best performance? Having to convert everything to a QByteArray (single byte wide string) and using QHash for every lookup? Or using QMultihash and only having to do the potential conversion when there are multiple objects with the same hash code? If it helps, look-ups will be fairly common as the table is meant to address look-ups for a script engine (the main reason I need to use strings). My guess is the second option, but I just wanted to make sure.
-
@primem0ver It is not really clear what you want to do.
Do you have different string classes which you want to put in a hash? Do you want to use those strings as keys? So, what is key what is value?
Why can't you use one string class and convert when needed? -
Yes. I have different string types from different libraries. The string is used for "the key" and a pointer is the value. However, because of the different libraries and the need for adding extra information, I wrap the strings char* data in a QLatin1String, calculate its hash, and OR it with the hash of some added information to obtain the final hash value (an unsigned int) which is then used as the hash value to put it in the table. So currently I am using QHash/QMultihash<uint32, void*> (not really void... but good enough).
The string used is unique, however, since i am not actually using the string, it is possible to end up with the same uint32 for two different strings. So I am wondering which would be faster... to use the uint32 on a QMultihash structure (and hand compare the string for the occasional collision) or hash the string on a regular QHash structure? I am not worried about complexity of the code. I am worried about speed.
-
I think I answered my own question for speed. I wasn't really thinking about the fact that the hash is only calculated once for placement in the table and once for each lookup regardless of the method used. So then the question becomes one of storage space. Which would conserve the most space? Are duplicates of the strings created for the table itself when using a QLatin1String? Or is just the pointer stored?
-
@primem0ver said in QHash or QMultihash (specific performance based use case)?:
Are duplicates of the strings created for the table itself when using a QLatin1String? Or is just the pointer stored?
I'm not sure I understand. You wrote your table looks like QHash/QMultihash<uint32, void*>. Where is QLatin1String here?
-
@jsulm
Why I am intervening to (try to) explain what the OP is saying/wants, I don't know! But I have been following this thread....I believe he is saying:
- He wants a lookup table for string values.
- The key is to be the string; the value is irrelevant.
- The strings (which themselves are unique) come in a variety of "internal formats" (
std::string
,QString
,char *
, others). - He is proposing to convert them all to
QLatin1String
to store as the key for lookup.
He has thousands of such strings. He is concerned both about the storage requirement for the strings as keys in the hash table, and more particularly about the speed of look up (hence questions about needing to convert a string to be sought to
QLatin1String
each time it needs to be looked up.Something like that anyway!