QTreeView with lots of items is really slow. Can it be optimised or is something buggy?
Yes perhaps you have found an issue in list view itself. Worth reporting to Qt bug tracker perhaps.
I guess you have tried all view optimizations like uniform row height etc.?
The benchmark isn't apples to apples. You aren't just comparing different views, you are comparing different view + model combinations. The model base class should be chosen to match the structure of the data, not the view(s) that are going to display it, and there is absolutely no problem with having a
display a list model, or aQListView
display a hierarchical model.So for start, I'd use the same model with all three widgets, to compare the performance of just the widgets. In particular, I'd expect
to be significantly faster with a table model, asQAbstractTableModel
always sets theItemNeverHasChildren
flag. -
I have taken a bit of time to try your code. I do not have an answer for the obviously large timing differences we see. However I have a couple of observations which I would like to type in here while they are in front of me. They do not resolve your fundamental issue but I hope they are nonetheless of some interest.I use Ubuntu 24.04 running in a VirtualBox under a Windows host. My machine is probably a bit slower than yours. To keep the timings down I show results for the default 1 million items. The times do seem to scale correspondingly when I do try more rows. I do do all my test timings several times.
time taken to display table with 1000000 items = 0.013 seconds time taken to display list with 1000000 items = 3.554 seconds time taken to display tree with 1000000 items = 6.189 seconds
My first test is a bit orthogonal to your issue. I wanted to check how it behaved with PyQt6 instead as a comparison. I had to make a couple of changes to source to make it acceptable but nothing significant:
time taken to display table with 1000000 items = 0.027 seconds time taken to display list with 1000000 items = 1.530 seconds time taken to display tree with 1000000 items = 2.983 seconds
Except for the first, fast case (slower startup time??) PyQt6 is consistently twice the speed of PySide6. This is rather depressing if one prefers the Qt LGPL Python bindings over the third-party GPL one :( It is true that for PySide6 I am forced to use a Python venv while for PyQt6 I do not and can use the system Python. However they both use the same Python 3.12.3 executable. I then continued with the PySide (not PyQt) version.
Then I had a look at code for the
case. (I am not prepared to look into thetree
case as that is such a different kettle of fish with parentage/hierarchies.) Against thetable
case.- Just FYI, for the
case I did not find commenting in/out all your lines for the horizontal/vertical headers made any (noticeable) difference, even if we thought it might. - For the
case I agree thesetUniformItemSizes(True)
made that nearly 10x faster. - I added
(in addition to the uniform size). This made it 2x faster than your original code.
Then I didn't see anything else obvious to further add to the
case. I do realize these timings are still way out for the difference between table vs list view. For Qt very old answers/posts are often still as relevant as recent ones, and I came across [Qt-interest] Why QListView is so slow and QTableView is fast? from 2009 (!):QTableView can scale much larger and faster then QListView can. I
have an application that has 101,000,000+ rows inside of a QTableView
with no scroll or startup problems, while this would not be possible
in a QListView. You can always use a QTableView and just turn off the
headers if you just want a classic vertical list ala QListView.Take such an old post as you please, but it may be an indication that there has always been a fundamental speed/scaling difference between
. Everything else I came across basically said to usesetUniformItemSizes(True)
, which we know if a big help but clearly does not go far enough/explain the remaining fundamental difference.In order to progress much further I would need to rewrite your code in C++, have Qt source code (which I don't) and profile to see what is going on. I'm afraid that is too much for me. Good luck if you wish to report it on the Qt bug tracker to see if you get an immediate answer from someone who knows the code. In view of the foregoing I think you should simply use a
where you might use aQListView
(which apparently is not suited to huge lists, perhaps being more aimed at presentation of small lists).QTreeView
is a different matter.I may have some further plays and post if I have anything more useful to say, but don't hold your breath. Hope the above is of some (limited) interest nonetheless.
- Just FYI, for the
@JonB inspired me to tackle this further, so I converted OP's benchmark to C++ (with only a table model, since this what makes sense for the data):
#include <QApplication> #include <QAbstractTableModel> #include <QHeaderView> #include <QListView> #include <QTableView> #include <QTreeView> class TestModel : public QAbstractTableModel { public: TestModel(int numRows) : numRows(numRows) {} int columnCount(const QModelIndex& parent) const override { return 2; } int rowCount(const QModelIndex& parent) const override { return numRows; } QVariant data(const QModelIndex& index, int role) const override { if (role != Qt::DisplayRole) { return QVariant(); } if (index.column() == 0) { return QString::number(index.row() + 1); } else { return QString("info_%1").arg(index.row() + 1); } } QVariant headerData(int section, Qt::Orientation orientation, int role) const override { if (orientation != Qt::Horizontal || role != Qt::DisplayRole) { return QVariant(); } if (section == 0) { return "Number"; } return "Info"; } private: int numRows; }; int main(int argc, char** argv) { QApplication app(argc, argv); QString type = "list"; if (argc > 1) { type = argv[1]; } int numRows = 1000000; if (argc > 2) { numRows = atoi(argv[2]); } TestModel model(numRows); QAbstractItemView* view; if (type == "list") { view = new QListView(); qobject_cast<QListView*>(view)->setUniformItemSizes(true); } else if (type == "table") { view = new QTableView(); } else { view = new QTreeView(); qobject_cast<QTreeView*>(view)->setUniformRowHeights(true); } QElapsedTimer timer; timer.start(); view->setModel(&model); view->show(); qDebug() << type << "with" << numRows << "rows took" << (timer.elapsed() / 1000.0) << "seconds"; return app.exec(); }
This were my results on a 5 years old Intel laptop, with stock Qt 6.8 build from Arch Linux, for the full 70 million rows test case:
"list" with 70000000 rows took 2.426 seconds "table" with 70000000 rows took 0.336 seconds "tree" with 70000000 rows took 2.216 seconds
This is several orders of magnitude less than the Python numbers (which I can reproduce on my machine too). If I had to guess, I'd say that the problem is due to using it via the Python wrappers. My hypothesis will be that
is asking for the content of all/most of the model indices when first shown, which causes the model to allocate 100s of millions of Python stringPyObject
instances with their reference counts, bridge those to QStrings on the C++ side, and then destroy all of them. CPython's memory management is... not good (to say the least) with such things.Not sure what can be done with it, other than moving to lazy loading, which you said you don't want.
This was a pending reply to the 2nd reply by @sierdzio .
The forum didn't send it for me at the time due to new user post rate limiting.The source code i provided includes the optimisations I found, which did make a humungous difference, but still way slower than table view.
# optimisations for list view self.view.setUniformItemSizes(True)
# optimisations for tree view self.view.setUniformRowHeights(True)
@IgKh I am new to Qt so I didn't realise you could use mismatched models with different views. I will look into this as it will help simplify the example code some more.
I changed just the lines that setup the model for each of the different views so that all views used the table model, eg:self.model = BigTableModel(self, self.max_num_nodes)
This made listview take twice as long to process as it did before, so it now takes longer than treeview! This seems insane that a 1D listview can take this long. Especially when the 2D tableview is able to display it instantly.
These are the timings where all views use the BigTableModel().
As you can see the timing for the listview and treeview scales linearly with the number of items. The timing for the table view does not scale linearly, otherwise it would be 5.6 seconds for 70m items.1,000,000 items
time taken to display table with 1000000 items = 0.080 seconds
time taken to display list with 1000000 items = 5.051 seconds
time taken to display tree with 1000000 items = 2.572 seconds10,000,000 items
time taken to display table with 10000000 items = 0.140 seconds
time taken to display list with 10000000 items = 49.748 seconds
time taken to display tree with 10000000 items = 25.325 seconds70,000,000 items
time taken to display table with 70000000 items = 0.509 seconds
time taken to display list with 70000000 items = 358.775 seconds
time taken to display tree with 70000000 items = 174.774 seconds -
Can someone please upvote either my original post or one of my replies? I am being rate limited to 1 reply every 10minutes due to not having a reputation of '1' :P
The last reply I typed out was also "flagged as spam by askimet.com" and so didnt get posted, I'm not sure if reputation helps with aggressiveness of spam filtering.
I tested in windows 11 and ubuntu 24.04 using python 3.12.3 on both and got similar timing figures, ie tableview was basically instant no matter how many items while listview and treeview had dramatically noticable slowdown which scaled with the number of items.The code where I play around with the header values in the table case didn't change the timing for me either. That was an attempt at working around a separate issue (as mentioned near the top of my code) where tableview will fail to render at all if there are more than 71,582,788 items. I refer to this as the "72m item bug". According to bug info link saved in my sourcecode this is due to 32bit signed integer max value being divded by a row height of 30 which gives the magic value 71,582,788. A bug fix for this has apparently been submitted to the dev branch of Qt so will hopefully make it into the main branch sometime soon. They mentioned working around this bug by doing the "header value code" I included. For me the addition of this code stopped the app from freezing, but resulted in no item values actually being rendered in the table.
For now I do like you suggested and use a tableview instead of a listview for when I am displaying a flat 2D list of data. However I also need to actually display my modelled data as a tree, so I do need to use treeview too. My example code didn't use any hierarchical data in treeview both for simplicity and as an attempt at optimising treeview/model as much as possible to narrow down where the cause of the timing stems from.
I will add some counters to the models in my python code to see how many times the data access (and other methods) are being called. I did try some python profiling on the code from my real app, which iirc mainly pointed to window.show() taking all the time. I will try profiling the sample code and share any worthwhile results here too.
With regard to the python wrappers potentially being a cause for the large timings, I do expect them to make it slower than native C++ code. However the fact that using tableview with python wrappers is able to handle the 2D model data basically instantly, suggests that such things should be possible for listview and treeview too.
Perhaps listview and treeview have more underlying calls to stuff that cross the barriers between C++ and python code that increase timing in ways that is more negligible when coding in C++ alone. I mean if I was coding this app in C++ and getting differences in timing like you got where table is 0.336 seconds and treeview was 2.216 seconds for 70m items, I'd most likely just accept it as "not being too long to wait" and use it without any further question.
I ran tests with my sample code changed to only use BigTableModel() for all 3 view types and with counters added to count the number of times the methods I provide are called. My model only has 4 methods in it:
headerData(), rowCount(), columnCount() and data().Possibly the underlying QAbstractTableModel code has the other methods in python code? I am not sure on how the underlying code is handled whether it is handled in C++ code or python code unless I provide my own implementations of the methods and virtual methods. I guess I could try adding methods to my model for every possible method and virtual method in order to count their usage and to see if adding them adds more slowdown.
This is the result for call counts:
1,000,000 items time taken to display table with 1000000 items = 0.081 seconds data() cnt = 84 headerData() cnt = 1680 rowCount() cnt = 18 columnCount() cnt = 17 time taken to display list with 1000000 items = 5.859 seconds data() cnt = 85 headerData() cnt = 0 rowCount() cnt = 2000062 columnCount() cnt = 2000059 time taken to display tree with 1000000 items = 2.975 seconds data() cnt = 156 headerData() cnt = 54 rowCount() cnt = 1000039 columnCount() cnt = 1000042 10,000,000 items time taken to display table with 10000000 items = 0.138 seconds data() cnt = 84 headerData() cnt = 1680 rowCount() cnt = 18 columnCount() cnt = 17 time taken to display list with 10000000 items = 57.361 seconds data() cnt = 379 headerData() cnt = 0 rowCount() cnt = 20000474 columnCount() cnt = 20000471 time taken to display tree with 10000000 items = 28.985 seconds data() cnt = 493 headerData() cnt = 107 rowCount() cnt = 10000204 columnCount() cnt = 10000207
These results show that data() is only being called a sane amount of times, so shouldnt be responsible for the huge amount of time taken. However rowCount() and ColumnCount() are being called an insane amount of times xD
Also their call counts scale linearly with the number of items, just like the time taken. AND the call counts for listview are double the call counts for treeview, just like how listview takes double the time that treeview takes.
This makes a lot of sense. From the developers point of view they would assume that calls to data() could be costly, and so are careful about calls to it. As is seen in the sane amount of 493 calls to data() for 10million items in treeview. However developers would also probably assume that a call to rowCount() and columnCount() would be "cheap" computationally wise, so not worry about how many times it is called. And in fact it probably is fairly cheap when the code is all native C++, as seen in tests done by @IgKh where even 70m items only took around 2 seconds. However when rowCount() and columnCount() have to cross the boundary between C++ and python millions of times, it is no longer a cheap operation.
Perhaps armed with this knowledge the underlying C++ code could be tweaked to either use "cached" values for row and column counts (at least during expensive operations such as setup) or to be mindful of the number of calls performed. I have often seen loops like:
for(int i=0 i<obj->columnCount(); i++) { do_stuff(); }
This could be refactored to only call columnCounbt() once instead of potentially millions of times:
int count = obj->columnCount(); for(int i=0 i<count; i++) { do_stuff(); }
I don't know if this is the case here, but there is a good chance that something like it is occuring.
When profiling the PySide6 code, it doesn't show a lot of what is going on "under the hood" since the python code is for the most part just wrappers on top of the C++ code. But here is the info I do get from it:
I ran a test using listview with 10 million items.
The test ran for 175 seconds, but 105 of those seconds was the app sitting there once loaded and me not realising it had finished loading yet. So disregard the extra 105 seconds. My debug prints told me that the processing time took 70 seconds, and the profile data backs that up. Note that times are longer than normal processing of 10m items due to the profiling that is being done at the same time. This 70 seconds of processing would normally be done in 5 seconds if it werent also profiling.Cumulative Time
These are the functions called ordered by "cumulative time", so pay attention to the "cumtime" column. The value in this column shows the time spent in that function and all subfunctions that it calls.- The 105 seconds for "built in method exec" is the time spent where the app sat there once loaded.
- You can then see 70 seconds spent in the MainForm.show() method. This is where the excess processing time is occuring, inside this function and whatever subfunctions it calls.
- Next up is rowCount() which takes 16 seconds (of the 70 seconds). Internally it calls the 2 of the other entries in this list: isRootIndex() which calls isValid().
- The only other call in the list is columnCount() which takes about 5 seconds
I truncated the list there as all entries that followed took less than a second and didn't appear important.
This means that python code for rowCount() and columnCount() combined take up about 20 seconds of the 70 seconds for show. I am not sure if the other 50 seconds is taken up in handling profiling or in the C++ code underneath. It does suggest that at least 20/70 = 28% of that time is taken handling calls to rowCount and columnCount on the python side.
ncalls tottime percall cumtime percall filename:lineno(function) 42/1 0.000 0.000 175.459 175.459 {built-in method builtins.exec} 1 0.004 0.004 175.459 175.459 ..\bak\BigItemModel2.py:1(<module>) 1 0.020 0.020 175.327 175.327 ..\bak\BigItemModel2.py:286(main) 1 105.265 105.265 105.266 105.266 {built-in method exec} 1 48.765 48.765 70.037 70.037 {method 'show' of 'PySide6.QtWidgets.QWidget' objects} 20000230 10.507 0.000 16.283 0.000 ..\bak\BigItemModel2.py:118(rowCount) 20000230 3.338 0.000 5.776 0.000 ..\bak\BigItemModel2.py:102(isRootIndex) 20000227 4.989 0.000 4.989 0.000 ..\bak\BigItemModel2.py:126(columnCount) 20000230 2.438 0.000 2.438 0.000 {method 'isValid' of 'PySide6.QtCore.QModelIndex' objects}
I guess the next step is to look through the C++ code.
@IgKh are you able to run a profiler on your C++ code?I have started looking through the QListView sourcecode online. I didn't find any simple cause yet for my problem, though I did identify an issue exactly as I suggested might occur earlier in this thread.
The sourcecode
https://code.qt.io/cgit/qt/qtbase.git/tree/src/widgets/itemviews/qlistview.cpp?h=6.8There is the potential for unneeded calling of rowCount() once for every item in the list when using selectAll(). If you look at line 603 inside "QListViewPrivate::selectAll()" you can see it.
for(; row < model->rowCount(root); ++row) {
@Cipherstream said in QTreeView with lots of items is really slow. Can it be optimised or is something buggy?:
However when rowCount() and columnCount() have to cross the boundary between C++ and python millions of times, it is no longer a cheap operation.
Well, let's examine this instead of guessing!
As before I am just testing the
case. With my starting code from where I am now:time taken to display list with 10000000 items = 17.531 seconds
You show
each being called twice as many times as the 10 million items. So I go toBigListModel.__init__()
and append the following at the end of it:for i in range(self.max_num_nodes * 2): if self.rowCount(parent) < 0 or self.columnCount(parent) < 0: print("Whoops!")
time taken to display list with 10000000 items = 27.802 seconds
Well, that's a fair amount of the original time. 10 of the original 17 seconds are being spent just doing these
calls. (It's then not hard to imagine something else taking up a lot of the remaining original 7 seconds.) You hypothesise that "crossing the boundary between C++ and Python" is particularly expensive. So we'd better test without a Qt virtual method. We just copy the definitions of yourrowCount()
and rename the copiesrow
and call those instead in thefor
loop.time taken to display list with 10000000 items = 27.831 seconds
So no virtual or C++<->Python boundaries to cross yet identical time. The huge overhead is just Python.
Which leaves us with two (white? black?) elephants in the room, neither of which you are going to like:
If you are going to have "millions" of items and care about performance you had better not use Python. It seems it is demonstrably just not up to the job here. Although your point about "wouldn't it be better if the original Qt C++ code cached the value of
instead of calling the method so often" may be valid you are peeing in the wind if you expect it to be written this way to help out, say, Python, if apparently the original C++ calls are perhaps inline or in any case vastly faster. It's just not going to happen, and would require an enormous rewrite of existing Qt code. -
Don't put a model with 10 million items in a UI view. Maybe it happens to be fast enough with a
, for whatever reason, but not with aQListView
, but it's just "way too many" for something intended to be shown to a user. And for the record, once those 10 million records are stored in a database I think you're going to be spending a lot of time (and memory) reading them all in, on top of the display time. I really expect any application wanting to show this many records to have some sort of "paging" mechanism, and perhaps in the case of the treeview display of nodes without their children initially and creating them on parent node expansion.
Conclusion: Once we discover that something as "insignificant" as the fact that many calls to
are in the Qt code and that alone is incredibly expensive for Python it becomes unsurprising that there may be a surprisingly large difference between the timings of the 3 types of view due to what may be innocuous differences in their code. -
I just had a think. You want to display 70 million items. Let's assume each item costs 100 bytes. (Don't know how big your items are, there are all sorts of overheads, and on top of whatever it takes up in the model there must be a further overhead per item to put it in a view. Anyway, I'm taking 100 as my multiplier.)70 million times 100 is 7 billion. 7 billion bytes is 7GB. That is a lot of memory to use! Just OOI how much space is your Python app using for these rows? If you want 70 million items now, perhaps you'll want 140 million tomorrow, or 700 million...?
Christian Ehrlicher Lifetime Qt Championreplied to Cipherstream on 30 Oct 2024, 11:46 last edited by
@Cipherstream said in QTreeView with lots of items is really slow. Can it be optimised or is something buggy?:
for(; row < model->rowCount(root); ++row) {
Nice finding even though it's not that expensive here since root is most likely the invalid root index and therefore the calculation is cheap. But nonetheless it's useless. You might provide a patch - I can approve it.
I am new to Qt so I am sure you know more than me. When I looked at the code for selectAll() it looked like it was iterating over all rows on the root. Wouldn't all rows in a listview be on the root? ie if you had a listview with 10 million items, wouldn't all 10 million items be rows "on the root"? Which would mean that rowCount() would get called 10 million times when calling selectAll()? -
I'll answer your shorter post first, since it is quicker :)Yes a display with 70 million items could indeed take up a lot of ram. However due to the model/view paradigm you can potentially look up the data as needed and not need to keep any items in memory.
You might then say "that sounds like a great case for fetch more", however I did initially try using fetch more and as far as I could tell it was an iterative fetcher from index 0 upwards. So to get the last item in 70m items it would still have to fetch all 70m items.
If instead I have a fileformat on disk with a header that says "this file has 70m items", I can just read in the header and know how many items my view needs to handle. The user then scrolls the scroll bar to the location they want to look at and only that data will then be accessed. So they could do ctrl+end to go to the end of the list and the model will only access the items displayed on the last "page" of the view.
This does indeed work well like this. If you let this initial slow setup part finish and then attempt to move through the data, you can see that it only fetches the minimal amount of data needed.
This is for an app similar to wireshark where you may indeed need to have millions of items/packet in the one view. Funnily enough while looking into this issue I came across the developers of wireshark also talking about this problem (as they also use Qt it seems) and arguing that yes they do indeed to handle that much data in a view, however they are not doing it in python hehe.
Your tests are a good idea to look at getting real world timing for these calls. However the point is that I can't think of any reason why rowCount() or columnCount() needs to be called 20m times each if I have 10m items. Whatever code is doing this is imo the code that needs fixing.Your timing test results suggest that calling them so many times is taking at least 10 of the 17 seconds. Therefore optimising whatever is calling them so many times to only call once and cache the result value would immediately improve the time taken by 10 seconds. A 59% speed increase.
For the elephants in the room :)
Yes python is slower. However the simple fact that tableview is able to handle the processing instantly suggests that python speed in general is not the issue. I agreee that expecting Qt code to be changed just for the sake of python wrapper speed may be a bit of an ask. However I also feel that a model implementation that is making a linear amount of these calls based on the number of items is faulty. The C++ implementation by @IgKh above shows listview and treeview takes 7 times longer than tableview which also suggests something is not right. However since the "7 times longer" value is only 2 seconds it can be ignored.
It seems I answered this in my previous reply to you. I don't agree with the blanket statement that "10 million items in a UI view is too much". Yes it is a lot and most of the time it is not needed. But like a lot of things there are times where it is needed. The "paging" mechanism is exactly what the model/view paradigm is for! And indeed only "pages" of the data are actually accessed at a time when these UI views are in use. It is only during startup that some of these views are (incorrectly imo) doing some kind of processing for every single item.
Yes many calls to rowCount() is expensive. I don't see a reason why that many are done at startup. I hope to track down the cause to see if there is a valid reason :)
Since it appeared that the cause of the issue was in the C++ Qt code I installed Qt C++ dev env and was able to build and debug the code that @IgKh provided.
I tracked down the cause of the many calls to rowCount and columnCount for the list view. They were coming from line 2600 onwards in:
https://code.qt.io/cgit/qt/qtbase.git/tree/src/widgets/itemviews/qlistview.cpp?h=6.8The method
iterates over every row in the model to precalculate offsets for items in the list. This code does an optimisation where if gridSize has been set it will use the grid size for all items instead of calculating it for every single item. I confirmed that this optimisation had an effect by adding the lineself.view.setGridSize( QtCore.QSize(18,18) )
to my python code.The list view already has another existing optimisation for item sizes enabled by doing
. This "fixed item size" flag should also be used in this setup code in the same way the fixed size from a grid setting is used.However if you remember, list view was calling rowCount and columnCount twice for every item in the model. The 2nd calls are done in this same method when checking if the row is hidden. The same loop that iterates over all items in the model also does an "is hidden" check on every item. The "is hidden" check looks to see if each item is present in the list of "hidden rows". An optimisation that can be done for this is to check if that list of hidden rows is empty, in which case no items are hidden and so it doesn't need to individually check whether every single item is hidden.
The first optimisation makes sense since if a "uniform item size" optimisation flag is set, then the code shouldn't check every single item in the list for its size!
The second optimisation might be up for debate as to whether it should be included or not. It does make a drastic change in processing time for python list views with lots of items however.It turns out that Qt is very easy to build from source on windows, so I was able to make a custom Qt build and test my optimisations with my original python code.
Without either optimisation: time taken to display list with 70000000 items = 358.775 seconds With "size" only optimisation: time taken to display list with 70000000 items = 198.576 seconds With "size" and "hidden" optimisations: time taken to display list with 70000000 items = 0.992 seconds
So the python code went from taking 6 minutes preprocessing before showing the view to just under 1 second.
Wrt the rowCount calls: https://codereview.qt-project.org/c/qt/qtbase/+/601341