Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

on resolving bottlenecks in QSortFilterProxyModel



  • Hello,

    Recently I was writing a command palette program with fuzzy matching/sorting with Qt.
    I noticed that with 100k rows, it becomes incredibly slow; roughly 6s for regexp /.*a.*/, and I found some weird causes.

    To identify the root cause, I wrote a benchmark that uses QRegularExpression for filtering, and it was fast enough (0.09s for every regexp).

    ...
    
    int main() {
        char str[1000] = {0};
        int i, j;
        QVector<QString> vecs;
    
        for(i = 0; i < COUNT; i++) {
            vecs.push_back(get_random_str());
        }
    
        for(j = 0; j < 100; j++) {
            printf("%d\n", clock());
            memset(str, 0, 100);
            for(i = 0; i < 2 + rand() % 8; i++) {
                str[i*3]='.';
                str[i*3+1]='*';
                str[i*3+2] = key[rand()%(sizeof(key)-1)];
            }
            str[i*3]='.';
            str[i*3+1]='*';
            QRegularExpression regex(str);
            int count = 0;
            for(i = 0; i < COUNT; i++) {
                if(vecs[i].contains(regex)) count++;
            }
            printf("! %d\n", count);
        }
    }
    

    And here is my code for the palette. I disabled sorting based on fuzzy match scores.

    class MyFilter: QSortFilterProxyModel {
        QRegularExpression expression_;
        QString keyword_;
    public:
        bool filterAcceptsRow(int source_row, const QModelIndex &source_parent) const override {
            auto model = sourceModel();
            auto str = (model->index(source_row, 0, source_parent).data()).toString();
            bool result = str.contains(expression_);
            return result;
        }
    
        void setMyFilter(QString &keyword) {
            static QRegExp emptyRegExp;
            keyword_ = keyword;
    
            QStringList regexp_before_join = {
            };
    
            for (auto &x: keyword)
                if(!x.isSpace()) regexp_before_join.push_back(x);
    
            expression_ = QRegularExpression(regexp_before_join.join(".*"),
                                             QRegularExpression::CaseInsensitiveOption);
    
            setFilterRegExp(emptyRegExp);
        }
    };
    

    [full code]

    But when I did profiling, the overhead was mostly in a Qt 5 core library function named QSortFilterProxyModelPrivate::build_source_to_proxy_mapping.

    0_1552270636139_58b88d9c-1c7d-4153-93a2-696b8df207b0-image.png

    I think that it creates a mapping between the source and proxy model. But I don't fetch mapping from source to destination, so maybe I'm only using half of this feature. Occupying 67% of the overhead makes it a top priority issue.

    Given this, I'm going to not use the proxy and make a new vector of items(or key of it) every time I type a keyword, but I'm not sure if I'm doing it right. It'll be greatly appreciated if you share some ideas on this.

    Thank you for reading!


  • Lifetime Qt Champion

    Reimplementing filterAcceptsRow() when using setFilterRegExp() is not needed - then Qt is doing the filtering for you.
    An index is needed - since there are no absolute values, did you have performance impacts in your real world application? Just filling a vector isn't that expansive but since this is your only thing you're doing in your test you will see this as hotspot.



  • Thanks for the reply! I'm using Qt 5.6 for compatibility issues, and there was no QRegularExpression-with-JIT-compilation support(only QRegExp) so I rewrote filterAcceptsRow() to avoid regexp matching.

    Yes, I'm developing this as a symbol searcher plugin for another program with >68k symbols, then wrote the test to resolve the similar pattern of lags. Currently, I made a new class inheriting QStandardItemModel and inserts every row on a search query update.

    Indeed the .fill() or .at() will not cause so much overhead. I'm not sure but the root cause is that filter_changed -> handle_filter_changed -> remove_source_items calls remove_proxy_interval -> **build_source_to_proxy_mapping** so much because proxy intervals calculated internally is too many. Maybe this commit is related, so I'm upgrading the version for verifying the assumption...

    I've attached the suspecting code:

    QSortFilterProxyModelPrivate::remove_source_items():
    ...
        QVector<QPair<int, int> > proxy_intervals;
        proxy_intervals = proxy_intervals_for_source_items(source_to_proxy, source_items);
    
        for (int i = proxy_intervals.size()-1; i >= 0; --i) {
            QPair<int, int> interval = proxy_intervals.at(i);
            int proxy_start = interval.first;
            int proxy_end = interval.second;
            remove_proxy_interval(source_to_proxy, proxy_to_source, proxy_start, proxy_end,
                                  proxy_parent, orient, emit_signal);
        }
    
    QSortFilterProxyModelPrivate::remove_proxy_interval():
    ...
        build_source_to_proxy_mapping(proxy_to_source, source_to_proxy);
    
    

  • Lifetime Qt Champion

    If you are looking for speed I would suggest not using QStandardItemModel but derive from QAbstractItemModel/QAbstractTableModel and don't insert each row separately but insert it chunk-wise. And upgrading to 5.12 maybe also helps since there were some improvements wrt to itemviews/models.


Log in to reply