QSortFilterProxy slow

dcortesi

Working in PyQt5, Qt5.4, MacOS. I subclass QTableView, QAbstractTableModel, and QSortFilterProxyModel to create a simple table sorted on column 0, which contains string values.

In a typical case the value returned by model.rowCount() is ~1300 rows. I have measured the time it takes to perform model.endResetModel() at about 5.7 seconds (this is an 3.5GHz Core i5). During this time my model.data() method is called 410,368 times for the Display role of column 0 (the sort column).

If I then force a beginResetModel()/endResetModel() sequence without any change in the model data, this takes 0.8 seconds and calls for the value of column 0 "only" 37,292 times, i.e. "only" about 28 times for each value.

Then I click on the column header to cause a descending-order sort, this takes about 2.7 seconds before the table redisplays with descending values; then again force begin/end reset model and again 400K data calls for column 0 and 5+ seconds of "spinning beach ball" time.

Is there something I am doing wrong? Is there anything I should do when initializing these classes to make the table reset and sort faster?

SGaist

Hi,

Reseting a model should be done scarcely since it invalidates everything and each view will fully update again. Why to you need to call it that often ?

dcortesi

It is not done frequently. The user clicks a button to request a refresh based on what may be new data. At that time, the example data is generated by applying a regular expression over a 50K file and this takes under 0.1 second to finish. Then the code calls endResetModel() and and it is this display and sorting of the data that takes 5.8 seconds.

There is nothing in a normal UI that should take multiple seconds to display. And why is the data() function being called for the displayRole an average of 315 times for each row of column 0 of the table? Something is wrong. It's like the proxy is doing a Bubble Sort or something...

andre

beginResetModel and andResetModel should really only be used in prototype code. In production, you will want to use insertions and removals of rows or columns instead.

The proxy is not using a bubble sort as you can easily check. But note that what you are seeing is not only requests for the actual data, but requests for each of the roles instead. If you count only the accesses to the role the sort is performed on, you'll see more reasonable numbers.

However, going through a QSortFilterProxy model is never going to be fastest solution. It will always require accessing your dat through the QAbstractItemModel interface, with all the additional calls that involves. If you need quick sorting and you have access to the raw data in the model, you should just implement the sorting in your model itself. Just reimplement the sort function.

dcortesi

Here's the application, tell me how you would approach it. This is basically a document editor. The table in question shows the vocabulary of words used in the document, one word per row, with the count of times the word appears. There is a third column with some codes for example, the word fails spellcheck, the word is all-numeric, etc.

The UI is such that (a) the user wants to sort the table on col. 0 (word) with or without respecting case; or on col. 1 (count). And there is a combobox allowing the user to select "filters" e.g. all-cap words, misspelled words, numeric words.

The sorting is implemented by the SortFilterProxy, including handling clicking on column header to switch the sort order ascending/descending.

The filtering is handled by a custom filterAcceptsRow() method in the proxy. The Combobox signal goes to the proxy which sets a field tested by filterAcceptsRow. Null for "all", or a code to accept only e.g. misspellings.

Design-wise this fits perfectly with the sortFilterProxy paradigm, no?

When the user first opens a document, the table is empty, the model rowCount() returns 0. When/if the user wants to see the vocabulary, she clicks a button Refresh, which does as I said before:

beginModelReset()
sweep the document with a regex to pick off words, categorize them and count them and build a nice internal Python data structure. This takes negligible time, a small fraction of a second at most.
endModelReset()

Later in the session the user has edited the document (adding and editing text in the normal way) and clicks Refresh again. Once more we sweep the whole document tabulating word usage. An unpredictable number of rows will have different counts and flags than before, and a some rows will be gone and some new ones added. But what other way is there to say, "View, please display correct values for all the possibly modified rows" than to reset the model? That's what it's for, no?

But this leaves the user looking at a locked-up app for 5-6 seconds. It wouldn't be so bad if there were some way to do a progress bar, but there's no way to update a QProgressBar during the endResetModel method call.