on resolving bottlenecks in QSortFilterProxyModel
-
Hello,
Recently I was writing a command palette program with fuzzy matching/sorting with Qt.
I noticed that with 100k rows, it becomes incredibly slow; roughly 6s for regexp/.*a.*/
, and I found some weird causes.To identify the root cause, I wrote a benchmark that uses QRegularExpression for filtering, and it was fast enough (0.09s for every regexp).
... int main() { char str[1000] = {0}; int i, j; QVector<QString> vecs; for(i = 0; i < COUNT; i++) { vecs.push_back(get_random_str()); } for(j = 0; j < 100; j++) { printf("%d\n", clock()); memset(str, 0, 100); for(i = 0; i < 2 + rand() % 8; i++) { str[i*3]='.'; str[i*3+1]='*'; str[i*3+2] = key[rand()%(sizeof(key)-1)]; } str[i*3]='.'; str[i*3+1]='*'; QRegularExpression regex(str); int count = 0; for(i = 0; i < COUNT; i++) { if(vecs[i].contains(regex)) count++; } printf("! %d\n", count); } }
And here is my code for the palette. I disabled sorting based on fuzzy match scores.
class MyFilter: QSortFilterProxyModel { QRegularExpression expression_; QString keyword_; public: bool filterAcceptsRow(int source_row, const QModelIndex &source_parent) const override { auto model = sourceModel(); auto str = (model->index(source_row, 0, source_parent).data()).toString(); bool result = str.contains(expression_); return result; } void setMyFilter(QString &keyword) { static QRegExp emptyRegExp; keyword_ = keyword; QStringList regexp_before_join = { }; for (auto &x: keyword) if(!x.isSpace()) regexp_before_join.push_back(x); expression_ = QRegularExpression(regexp_before_join.join(".*"), QRegularExpression::CaseInsensitiveOption); setFilterRegExp(emptyRegExp); } };
But when I did profiling, the overhead was mostly in a Qt 5 core library function named
QSortFilterProxyModelPrivate::build_source_to_proxy_mapping
.I think that it creates a mapping between the source and proxy model. But I don't fetch mapping from source to destination, so maybe I'm only using half of this feature. Occupying 67% of the overhead makes it a top priority issue.
Given this, I'm going to not use the proxy and make a new vector of items(or key of it) every time I type a keyword, but I'm not sure if I'm doing it right. It'll be greatly appreciated if you share some ideas on this.
Thank you for reading!
-
Reimplementing filterAcceptsRow() when using setFilterRegExp() is not needed - then Qt is doing the filtering for you.
An index is needed - since there are no absolute values, did you have performance impacts in your real world application? Just filling a vector isn't that expansive but since this is your only thing you're doing in your test you will see this as hotspot. -
Thanks for the reply! I'm using Qt 5.6 for compatibility issues, and there was no QRegularExpression-with-JIT-compilation support(only QRegExp) so I rewrote filterAcceptsRow() to avoid regexp matching.
Yes, I'm developing this as a symbol searcher plugin for another program with >68k symbols, then wrote the test to resolve the similar pattern of lags. Currently, I made a new class inheriting QStandardItemModel and inserts every row on a search query update.
Indeed the .fill() or .at() will not cause so much overhead. I'm not sure but the root cause is that
filter_changed -> handle_filter_changed -> remove_source_items
callsremove_proxy_interval -> **build_source_to_proxy_mapping**
so much because proxy intervals calculated internally is too many. Maybe this commit is related, so I'm upgrading the version for verifying the assumption...I've attached the suspecting code:
QSortFilterProxyModelPrivate::remove_source_items(): ... QVector<QPair<int, int> > proxy_intervals; proxy_intervals = proxy_intervals_for_source_items(source_to_proxy, source_items); for (int i = proxy_intervals.size()-1; i >= 0; --i) { QPair<int, int> interval = proxy_intervals.at(i); int proxy_start = interval.first; int proxy_end = interval.second; remove_proxy_interval(source_to_proxy, proxy_to_source, proxy_start, proxy_end, proxy_parent, orient, emit_signal); } QSortFilterProxyModelPrivate::remove_proxy_interval(): ... build_source_to_proxy_mapping(proxy_to_source, source_to_proxy);
-
If you are looking for speed I would suggest not using QStandardItemModel but derive from QAbstractItemModel/QAbstractTableModel and don't insert each row separately but insert it chunk-wise. And upgrading to 5.12 maybe also helps since there were some improvements wrt to itemviews/models.