Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Optimizing building a "contained words" list



  • Hi, to write an anagram generator, I have to look through a huge amount (2 * 10^6 QStrings) of words and find out, if the chars to build these words can be found in a given text, thus creating a list of 'all' words that could be created from text's "char material".

    My solution takes far too much time, typically around 700 ms for a word list. Do you have ideas on how to do the task significantly faster?

    bool contains(QString text, QString word) {
        for (auto i = word.begin(); i != word.end(); i++) {
            if (word.count(*i, Qt::CaseSensitive) > text.count(*i, Qt::CaseSensitive)) {
                return false;
            }
        }
        return true;
    };
    

    I've tried avoiding to check a char twice, but this needed about 50% more time:

    bool contains(QString text, QString word) {
        QHash<QChar, bool> done;
        for (auto i = word.begin(); i != word.end(); i++) {
            int wordCount =  word.count(*i, Qt::CaseSensitive);
            int textCount = text.count(*i, Qt::CaseSensitive);
            if (wordCount > 1) {
                if (done.contains(*i)) continue;
                done.insert(*i,true);
            }
            if (wordCount > textCount) {
                return false;
            }
        }
        return true;
    };
    

    To fill the actual word list, I use this code:

    QVector<QString> wordList(QString text, const QVector<QString> bigList) {
        QVector<QString> resultList;
        resultList.reserve(6);
        for (auto s : bigList) {
            if (contains(text, s)) {
                resultList.append(s);
            }
        }
        return resultList;
    }
    

    I've tried to pre-count the chars in the bigList, making it a

    QHash<QString, QHash<QChar, int>> bigMap;
    

    Here the count of each word's chars is stored to make "contains" faster, but unfortunately that structure seems to be too big, I get a bad_alloc exception at about 75% of filling it up.

    Any ideas where to go from here? I feel quite lost.


  • Lifetime Qt Champion

    Hi
    Im wondering if
    http://doc.qt.io/qt-5/qstringref.html#details
    would help but since QString is implicitly shared already im not
    sure the gain will be anything worthwhile.

    im also wondering if contains should not be

    bool contains(const QString &text, const QString &word) {

    for max speed ?


Log in to reply