Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Optimizing building a "contained words" list
Qt 6.11 is out! See what's new in the release blog

Optimizing building a "contained words" list

Scheduled Pinned Locked Moved Unsolved General and Desktop
2 Posts 2 Posters 472 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • SeDiS Offline
    SeDiS Offline
    SeDi
    wrote on last edited by SeDi
    #1

    Hi, to write an anagram generator, I have to look through a huge amount (2 * 10^6 QStrings) of words and find out, if the chars to build these words can be found in a given text, thus creating a list of 'all' words that could be created from text's "char material".

    My solution takes far too much time, typically around 700 ms for a word list. Do you have ideas on how to do the task significantly faster?

    bool contains(QString text, QString word) {
        for (auto i = word.begin(); i != word.end(); i++) {
            if (word.count(*i, Qt::CaseSensitive) > text.count(*i, Qt::CaseSensitive)) {
                return false;
            }
        }
        return true;
    };
    

    I've tried avoiding to check a char twice, but this needed about 50% more time:

    bool contains(QString text, QString word) {
        QHash<QChar, bool> done;
        for (auto i = word.begin(); i != word.end(); i++) {
            int wordCount =  word.count(*i, Qt::CaseSensitive);
            int textCount = text.count(*i, Qt::CaseSensitive);
            if (wordCount > 1) {
                if (done.contains(*i)) continue;
                done.insert(*i,true);
            }
            if (wordCount > textCount) {
                return false;
            }
        }
        return true;
    };
    

    To fill the actual word list, I use this code:

    QVector<QString> wordList(QString text, const QVector<QString> bigList) {
        QVector<QString> resultList;
        resultList.reserve(6);
        for (auto s : bigList) {
            if (contains(text, s)) {
                resultList.append(s);
            }
        }
        return resultList;
    }
    

    I've tried to pre-count the chars in the bigList, making it a

    QHash<QString, QHash<QChar, int>> bigMap;
    

    Here the count of each word's chars is stored to make "contains" faster, but unfortunately that structure seems to be too big, I get a bad_alloc exception at about 75% of filling it up.

    Any ideas where to go from here? I feel quite lost.

    1 Reply Last reply
    0
    • mrjjM Offline
      mrjjM Offline
      mrjj
      Lifetime Qt Champion
      wrote on last edited by
      #2

      Hi
      Im wondering if
      http://doc.qt.io/qt-5/qstringref.html#details
      would help but since QString is implicitly shared already im not
      sure the gain will be anything worthwhile.

      im also wondering if contains should not be

      bool contains(const QString &text, const QString &word) {

      for max speed ?

      1 Reply Last reply
      0

      • Login

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • Users
      • Groups
      • Search
      • Get Qt Extensions
      • Unsolved