Optimizing building a "contained words" list

SeDi

Hi, to write an anagram generator, I have to look through a huge amount (2 * 10^6 QStrings) of words and find out, if the chars to build these words can be found in a given text, thus creating a list of 'all' words that could be created from text's "char material".

My solution takes far too much time, typically around 700 ms for a word list. Do you have ideas on how to do the task significantly faster?

bool contains(QString text, QString word) {
    for (auto i = word.begin(); i != word.end(); i++) {
        if (word.count(*i, Qt::CaseSensitive) > text.count(*i, Qt::CaseSensitive)) {
            return false;
        }
    }
    return true;
};

I've tried avoiding to check a char twice, but this needed about 50% more time:

bool contains(QString text, QString word) {
    QHash<QChar, bool> done;
    for (auto i = word.begin(); i != word.end(); i++) {
        int wordCount =  word.count(*i, Qt::CaseSensitive);
        int textCount = text.count(*i, Qt::CaseSensitive);
        if (wordCount > 1) {
            if (done.contains(*i)) continue;
            done.insert(*i,true);
        }
        if (wordCount > textCount) {
            return false;
        }
    }
    return true;
};

To fill the actual word list, I use this code:

QVector<QString> wordList(QString text, const QVector<QString> bigList) {
    QVector<QString> resultList;
    resultList.reserve(6);
    for (auto s : bigList) {
        if (contains(text, s)) {
            resultList.append(s);
        }
    }
    return resultList;
}

I've tried to pre-count the chars in the bigList, making it a

QHash<QString, QHash<QChar, int>> bigMap;

Here the count of each word's chars is stored to make "contains" faster, but unfortunately that structure seems to be too big, I get a bad_alloc exception at about 75% of filling it up.

Any ideas where to go from here? I feel quite lost.

mrjj

Hi
Im wondering if
http://doc.qt.io/qt-5/qstringref.html#details
would help but since QString is implicitly shared already im not
sure the gain will be anything worthwhile.

im also wondering if contains should not be

bool contains(const QString &text, const QString &word) {

for max speed ?

Discover and share your #QtStories

Upcoming Forum Update April 22nd

Unsolved Optimizing building a "contained words" list