"Getting an issue fixed" and "QHash result intersection performance"



  • There is a Qt bug (or a missing improvement) that starts killing one of my algorithms due to complexity. Actually there a probably some ways of doing special optimizations case-by-case to improve this... but this time I don't want to bother you with the technical side of the problem.

    My actual issue now exists for 14 months. And it still is a big problem for me, making me creating horrible workarounds. And it even isn't evaluated! I can only guess if it can be done at all (and when).

    So my actual question is much simpler (and maybe stupid): Is there a way to make a Qt issue rising in priority?

    There is voting. But even if the issue is a showstopper (or its very critical) you will not get a lot of votes if there are few people doing the same thing.

    I could buy commercial support. But I can not affort it (actually I do not know current prices but back to Trolltech times it was far too expensive for me).

    Unfortunately I do not have time to work into the processes and code that would allow me to contribute directly.

    So finally I am stuck wondering if there is a way to get more attention to a Qt Bug Tracker issue. Or at least getting a kind evaluation of the issue so I know what I am dealing with.

    Is there some e-mail? Is there some special forum I could beg for votes? Can issues be resubmitted? Is there some way of motivating someone from the community fixing a specific issue? I surely could not compensate the actual effort so my few bucks would probably be not enough making Digia or someone else fixing a specific issue.

    EDIT: Changed the title since the thread is more of a technical discussion now


  • Lifetime Qt Champion

    Hi,

    You can also post the link to the bug report here


  • Moderators

    There's the "development mailing list":http://lists.qt-project.org/mailman/listinfo/development which you can subscribe to to send emails. Emails there are more visible to Qt engineers than bug reports (which only alert the assignee, I think). If the bug is bad enough, someone will escalate it.

    Note that they are currently preparing for the Qt 5.2 release which is scheduled for release in December. If you email now and the fix is straightforward enough, there's a chance it might be fixed for Qt 5.2. From 11 November 2013 onwards, only issues deemed "blockers" will be fixed. Others will have to wait until Qt 5.2.1 or later.

    If you know what the issue is, you could patch your copy of Qt (I remember you've compiled your own version?). You can also submit your patch to fix Qt (patches are reviewed far quicker than bug reports).



  • @SGaist: I intentionally avoided posting the link here. Imagine more people would do this and would flood the forum with copies of bug tracker issues.

    Anyway if you think it's the right place, here it is: https://bugreports.qt-project.org/browse/QTBUG-27013

    It is about an algorithm I use that calls QHash::values(key) on multiple related QHash objects and intersects the results. The latter is not possible in accetable speed because QHash::values(key) returns an unordered QList<value> instead of QSet<value> which does not provide a fast intersect() method. QList::toSet() is my current workaround but slows down the code significantly. I have no idea why they preferred QList<value> for this since the data is unordered anyway.

    @JKSH: No, actually I decided not to use self-build Qt. I instead invested some time into testing ANGLE on VMs and decided to switch to the ANGLE builds (which are provided in both x86 and x64 for VS2012).
    The mailing list is a good idea. I will subscribe to it.

    I wonder if there is any website collecting bounties for missing Qt features or bugs?!


  • Lifetime Qt Champion

    Are you thinking of something like:

    @
    QHash<int, QString> myHash;
    QSet<QString> valuesSet = myHash.valuesSet(myKey);
    @

    ?

    Edit:Indeed it was QString I had in mind



  • Yes (providing you intended to write "QSet<QString> valuesSet"). That's exactly what I need. But I suppose there are a few more functions that could be improved in the same way.

    Do you think that bounty idea is a good one?



  • bq. QList::toSet() is my current workaround but slows down the code significantly

    if you investigated so much the issue tbh I wonder why you have not done your own "maptoset" or "hashtoset" templated functions? with the code similar like in toList qt's src a code as bellow is better than your workaround:

    @template <class Key, class T>
    QSet<T> maptoset(QMap<Key,T> map)
    {

    QSet<T> res;
    res.reserve(map.size());
    typename QMap<Key,T>::const_iterator i = map.begin();
    while (i != map.end()) {
        res.insert(i.value());
        ++i;
    }
    return res;
    

    }

    template <class Key, class T>
    QSet<T> hashtoset(QHash<Key,T> map)
    {

    QSet<T> res;
    res.reserve(map.size());
    typename QHash<Key,T>::const_iterator i = map.begin();
    while (i != map.end()) {
        res.insert(i.value());
        ++i;
    }
    return res;
    

    }
    ...

    QHash<int,QString> myHash1;
    QHash<int,QString> myHash2;
    QHash<int,QString> myHash3;

    myHash1[1] = "one";
    myHash1[2] = "two";
    myHash1[3] = "three";
    
    myHash2[1] = "one";
    myHash2[2] = "three";
    myHash2[3] = "two";
    
    
    myHash3[1] = "one";
    myHash3[2] = "two";
    myHash3[3] = "five";
    
    QSet<QString> set1 = hashtoset(myHash1);
    QSet<QString> set2 = hashtoset(myHash2);
    QSet<QString> set3 = hashtoset(myHash3);
    
    QSet<QString> setIntersect = set1.intersect(set2.intersect(set3));
    
    qDebug()<<"interesection"<<setIntersect;
    

    @



  • @NicuPopescu: Thanks for you suggestion. But that's not what I want to do. I am afraid you missed the point.


  • Lifetime Qt Champion

    Could you show an example of what you want to do ?



  • it was an answer to the unhappy workaround :) ... as regarding the main topic I let the other advanced people here to respond



  • QMap::values() and QHash::values() can only sensibly return a QList. Both functions return all the values associated with the key specified in reverse insertion order (or all keys if no key is given). This list can, and in the general case will, contain duplicates. What you are proposing would have values() return a QSet where the values are the keys: this cannot contain duplicates. Your proposal might fit your particular data set but not the general case.

    Your problem is how to efficiently find the intersection of several unordered lists. You convert each list to a QSet and then use intersect(), but that involves hashing every value several times. Have you tried the generic std::set_intersection approach?

    @
    #include <algorithm>

    QList<int> first;
    first << 35 << 10 << 15 << 20 << 15 << 35 << 30 << 35;
    QList<int> second;
    second << 10 << 45 << 35 << 35 << 35 << 35;

    QList<int>::iterator it;
    std::sort(first.begin(), first.end());
    it = std::unique(first.begin(), first.end());
    first.erase(it, first.end());

    std::sort(second.begin(), second.end());
    it = std::unique(second.begin(), second.end());
    second.erase(it, second.end());

    QList<int> result;
    std::set_intersection (
    first.begin(), first.end(),
    second.begin(), second.end(),
    std::back_inserter(result));
    qDebug() <<"Result =" << result;
    @



  • @ChrisW67: Your algorithm is much better than my workaround. It's about twice as fast. I will use it instead. Thank you very much!

    But unfortunately it still is too slow for my application.

    I would like to discuss my proposal. I think it would give this kind of data analysis a major speedup and is worth being considered. Additionally, using QHash::valuesSet(key) + QSet::intersect() would be much more easy to use and to understand.

    I think having a QHash (or a QMultiHash in my case) with n:1 relation isn't unusual. Actually that is what was QMultiHash made for. There are a lot of applications where a fast n:1 lookup is needed. Simply most application that process large data amounts without involving a fully featured database. In these applications, valuesSet(key) would VERY useful. And even for a 1:n hash it could be useful to get a set of unique values. So I see no reason why valuesSet(key) (or similar) would be a bad idea. Specifically I don't think it is too specialized. Actually it would be probably more useful than the QList variant since getting the values in reverse insertion order is needed not very often on a QHash. But speed is a neck-breaker, especially for QHash

    You say that creating a set involves expensive hashing. You are right. This is probably killing my code. But that hashing is only necessary using my workaround. QHash and QSet use the same hashing algorithm (I think QSet internally is a QHash), right? A Q(Multi)Hash::valuesSet(key) method could simply copy the found hashes from the source QHash to the result QSet. This would be surely quick. Then that would allow doing the desired fast QSet::intersect() without separate (expensive) std:unique and sort calls. I suppose that would be even much faster than your algorithm.

    Are there any technical facts I am missing why this wouldn't work?



  • Hashes are designed for rapid retrieval by key, i.e, 1 to n lookup. The keys are hashed to achieve the speed on large data sets (for small key counts it's not worth the effort). Whether there is one or many values for a given key is largely irrelevant: the values are simply stored in the correct hash bucket.

    While QSet is based on a QHash what you are expecting as a result is a set built from the values for a given key in the source hash. The values are not hashed in the source QHash, so they will have to hashed in order to become the keys in the QSet internal hash table. Whether you do it internally or externally to QHash will make little performance difference.

    The problem you described is finding all the values that are common to all the keys you started with. This tells you nothing about the keys to which these values belong that you did not already know at the outset. You are not determining the keys that contain a certain value, a task for a which a second, reversed hash would be suitable. Perhaps I do not have a grasp on the problem.

    If you only ever need this structure for this query, or need the multiple values to be unique, then you could make the structure a map from key to QSet<valueType> so you build and store the sets as you go. You still have to hash all the values, but you move the cost from query to insertion time.



  • You are right, those are the wrong hashes. They would need to be calculated from the scratch. This would not work.

    Actually the entire data structures used by me are pretty much optimized to speed up read accesses by moving the cost to insertion. This is ok and desired. So your map-of-sets idea is pretty good. I will definitely try this! I am not sure if it would be better to first merge those sub-sets to intersect that with the main result or to immediately intersect each of the sub-sets with the main result.

    I think this solves my problem. But please allow me explore one more thought.

    The problem with my intersection idea is the fact that QSet needs hashes for data storage. This is expensive. But QMap does not suffer from this drawback. What if someone would build something like a tree-based set, lets call it QTree. How would such a tree perform on my cascaded lookup-and-intersect problem? QMap and QHash do not replace each other - they both have their drawbacks and so they both exist. I would expect the same for QSet and something like QTree. Why does no such a container exist?



  • Hi,

    first I missed your point, then I thought how to do hash tables as hash lists (sets with duplicates) from one key's multiple values and then a fast intersect function on them:

    @template <class Key, class T>
    QHash<T,T> HashToHashList(QHash<Key,T> hash)
    {

    QHash<T,T> res;
    res.reserve(hash.size());
    typename QHash<Key,T>::const_iterator i = hash.begin();
    while (i != hash.end()) {
        res.insertMulti(i.value(),i.value());//difference from QSet which uses
    

    //insert() ... now res is equivalent to QSet with duplicate values!
    ++i;
    }
    return res;
    }

    template <class T>
    QHash<T,T> intersectHashLists(const QHash<T,T> &hash1,const QHash<T,T> &hash2)
    {
    QHash<T,T> copy1(hash1);
    QHash<T,T> copy2(hash2);
    typename QHash<T,T>::const_iterator i = hash1.constEnd();
    while (i != hash1.constBegin()) {
    --i;
    if (!copy2.contains(*i))//fast look up by hash function
    copy1.remove(*i);
    }
    return copy1;
    }

    ...

    QHash<int,QString> myHash1;
    QHash<int,QString> myHash2;
    QHash<int,QString> myHash3;

    myHash1[1] = "one";
    myHash1[1] = "one";
    myHash1[2] = "two";
    myHash1[3] = "three";
    myHash1.insertMulti(1,"four");
    myHash1.insertMulti(1,"eight");
    myHash1.insertMulti(1,"five");
    myHash1.insertMulti(1,"six");
    myHash1.insertMulti(1,"one");
    
    qDebug()<<"myHash1"<<myHash1;
    
    myHash2[1] = "one";
    myHash2[2] = "three";
    myHash2[3] = "two";
    myHash2.insertMulti(1,"four");
    myHash2.insertMulti(1,"seven");
    myHash2.insertMulti(1,"five");
    myHash2.insertMulti(1,"six");
    
    QHash<QString,QString> hashList1 = HashToHashList(myHash1);
    QHash<QString,QString> hashList2 = HashToHashList(myHash2);
    
    qDebug()<<"hashList1"<<hashList1;
    qDebug()<<"hashList2"<<hashList2;
    
    QHash<QString,QString> intersectionHashLists = intersectHashLists(hashList1,hashList2);
    
    qDebug()<<"intersectionHashLists"<<intersectionHashLists;
    

    ...
    @

    QSet intert() source:
    @ inline iterator insert(const T &value)
    { return static_cast<typename Hash::iterator>(q_hash.insert(value, QHashDummyValue()));//does not allow duplicates@

    hope I have not missed the point again :)

    L.E. brr ... I missed the key finding again but I'll follow up :)



  • @NicuPopescu: I think your new algorithm is more or less the same as I requested in my issue. The line "res.insertMulti(i.value(),i.value());" does pretty much the same as set.insert() since it also calculates a hash for each value (which is the responsible for making my requested algorithm slow too).

    EDIT: And I missed your edit :-)



  • except that for copying from source to QHash or QSet involves calls to qhash function, there is an important difference in my approach: you can have duplicates, entire source will be found in new copy; I think the solution works well if you don't need to filter by a key which I missed to address in pursuing how to convert a container(I used QHash with mulitple values but it could be other) to a hash table for fast intersections ... edit: I was so sure that I found a possible solution and I realized after posting that I forgot to treat finding values by a key

    to filter by a key is not so big deal with little condition: either to have access to hash data or at least to have access to hash iterator encapsulated QHashNode

    now I have other wonder: since templated code in general is placed in headers, as it is the case for stl or qt container classes, why it couldn't be a solution to work with a local/project working copy slightly modified to meet your needs or even to do changes directly in qt sources, no need of qt rebuild ... all my efforts so far was to find a solution keeping qt source untouched



  • I slightly modified ..\Qt5.1.1\5.1.1\mingw48_32\include\QtCore\qhash.h

    @template <class Key, class T>
    Q_OUTOFLINE_TEMPLATE QHash<T,T> QHash<Key, T>::valuesHash(const Key &akey) const
    {
    QHash<T,T> res;
    Node *node = *findNode(akey);
    if (node != e) {
    do {
    res.insertMulti(node->value,node->value);
    } while ((node = node->next) != e && node->key == akey);
    }
    return res;
    }@

    and added as member in QHash class and it works as expected ... hope so :)


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.