Optimizing building a "contained words" list
-
Hi, to write an anagram generator, I have to look through a huge amount (2 * 10^6 QStrings) of words and find out, if the chars to build these words can be found in a given text, thus creating a list of 'all' words that could be created from text's "char material".
My solution takes far too much time, typically around 700 ms for a word list. Do you have ideas on how to do the task significantly faster?
bool contains(QString text, QString word) { for (auto i = word.begin(); i != word.end(); i++) { if (word.count(*i, Qt::CaseSensitive) > text.count(*i, Qt::CaseSensitive)) { return false; } } return true; };
I've tried avoiding to check a char twice, but this needed about 50% more time:
bool contains(QString text, QString word) { QHash<QChar, bool> done; for (auto i = word.begin(); i != word.end(); i++) { int wordCount = word.count(*i, Qt::CaseSensitive); int textCount = text.count(*i, Qt::CaseSensitive); if (wordCount > 1) { if (done.contains(*i)) continue; done.insert(*i,true); } if (wordCount > textCount) { return false; } } return true; };
To fill the actual word list, I use this code:
QVector<QString> wordList(QString text, const QVector<QString> bigList) { QVector<QString> resultList; resultList.reserve(6); for (auto s : bigList) { if (contains(text, s)) { resultList.append(s); } } return resultList; }
I've tried to pre-count the chars in the bigList, making it a
QHash<QString, QHash<QChar, int>> bigMap;
Here the count of each word's chars is stored to make "contains" faster, but unfortunately that structure seems to be too big, I get a bad_alloc exception at about 75% of filling it up.
Any ideas where to go from here? I feel quite lost.
-
Hi
Im wondering if
http://doc.qt.io/qt-5/qstringref.html#details
would help but since QString is implicitly shared already im not
sure the gain will be anything worthwhile.im also wondering if contains should not be
bool contains(const QString &text, const QString &word) {
for max speed ?