on resolving bottlenecks in QSortFilterProxyModel

jinmo

Hello,

Recently I was writing a command palette program with fuzzy matching/sorting with Qt.
I noticed that with 100k rows, it becomes incredibly slow; roughly 6s for regexp /.*a.*/, and I found some weird causes.

To identify the root cause, I wrote a benchmark that uses QRegularExpression for filtering, and it was fast enough (0.09s for every regexp).

...

int main() {
    char str[1000] = {0};
    int i, j;
    QVector<QString> vecs;

    for(i = 0; i < COUNT; i++) {
        vecs.push_back(get_random_str());
    }

    for(j = 0; j < 100; j++) {
        printf("%d\n", clock());
        memset(str, 0, 100);
        for(i = 0; i < 2 + rand() % 8; i++) {
            str[i*3]='.';
            str[i*3+1]='*';
            str[i*3+2] = key[rand()%(sizeof(key)-1)];
        }
        str[i*3]='.';
        str[i*3+1]='*';
        QRegularExpression regex(str);
        int count = 0;
        for(i = 0; i < COUNT; i++) {
            if(vecs[i].contains(regex)) count++;
        }
        printf("! %d\n", count);
    }
}

And here is my code for the palette. I disabled sorting based on fuzzy match scores.

class MyFilter: QSortFilterProxyModel {
    QRegularExpression expression_;
    QString keyword_;
public:
    bool filterAcceptsRow(int source_row, const QModelIndex &source_parent) const override {
        auto model = sourceModel();
        auto str = (model->index(source_row, 0, source_parent).data()).toString();
        bool result = str.contains(expression_);
        return result;
    }

    void setMyFilter(QString &keyword) {
        static QRegExp emptyRegExp;
        keyword_ = keyword;

        QStringList regexp_before_join = {
        };

        for (auto &x: keyword)
            if(!x.isSpace()) regexp_before_join.push_back(x);

        expression_ = QRegularExpression(regexp_before_join.join(".*"),
                                         QRegularExpression::CaseInsensitiveOption);

        setFilterRegExp(emptyRegExp);
    }
};

[full code]

But when I did profiling, the overhead was mostly in a Qt 5 core library function named QSortFilterProxyModelPrivate::build_source_to_proxy_mapping.

0_1552270636139_58b88d9c-1c7d-4153-93a2-696b8df207b0-image.png

I think that it creates a mapping between the source and proxy model. But I don't fetch mapping from source to destination, so maybe I'm only using half of this feature. Occupying 67% of the overhead makes it a top priority issue.

Given this, I'm going to not use the proxy and make a new vector of items(or key of it) every time I type a keyword, but I'm not sure if I'm doing it right. It'll be greatly appreciated if you share some ideas on this.

Thank you for reading!

Christian Ehrlicher

Reimplementing filterAcceptsRow() when using setFilterRegExp() is not needed - then Qt is doing the filtering for you.
An index is needed - since there are no absolute values, did you have performance impacts in your real world application? Just filling a vector isn't that expansive but since this is your only thing you're doing in your test you will see this as hotspot.

jinmo

Thanks for the reply! I'm using Qt 5.6 for compatibility issues, and there was no QRegularExpression-with-JIT-compilation support(only QRegExp) so I rewrote filterAcceptsRow() to avoid regexp matching.

Yes, I'm developing this as a symbol searcher plugin for another program with >68k symbols, then wrote the test to resolve the similar pattern of lags. Currently, I made a new class inheriting QStandardItemModel and inserts every row on a search query update.

Indeed the .fill() or .at() will not cause so much overhead. I'm not sure but the root cause is that filter_changed -> handle_filter_changed -> remove_source_items calls remove_proxy_interval -> **build_source_to_proxy_mapping** so much because proxy intervals calculated internally is too many. Maybe this commit is related, so I'm upgrading the version for verifying the assumption...

I've attached the suspecting code:

QSortFilterProxyModelPrivate::remove_source_items():
...
    QVector<QPair<int, int> > proxy_intervals;
    proxy_intervals = proxy_intervals_for_source_items(source_to_proxy, source_items);

    for (int i = proxy_intervals.size()-1; i >= 0; --i) {
        QPair<int, int> interval = proxy_intervals.at(i);
        int proxy_start = interval.first;
        int proxy_end = interval.second;
        remove_proxy_interval(source_to_proxy, proxy_to_source, proxy_start, proxy_end,
                              proxy_parent, orient, emit_signal);
    }

QSortFilterProxyModelPrivate::remove_proxy_interval():
...
    build_source_to_proxy_mapping(proxy_to_source, source_to_proxy);

Christian Ehrlicher

If you are looking for speed I would suggest not using QStandardItemModel but derive from QAbstractItemModel/QAbstractTableModel and don't insert each row separately but insert it chunk-wise. And upgrading to 5.12 maybe also helps since there were some improvements wrt to itemviews/models.