QSortFilterProxyModel vs SQLite (with FTS3)? in term of performance and memory usage

mbruel

Hi,
In my app, I'm getting a flat SQLite database, that I want to present in a View. I didn't decide yet if it will be a TreeView or ListView.
Well it will be easier with the example. The app is ClementineRemote, the DB contains songs with a title, an artist, an album, a filename...

I wish to have a text search feature that would search on all the fields I've given above.
I've no clue, which tech would be faster between a QSortFilterProxyModel with its filterAcceptsRow(int sourceRow, const QModelIndex &sourceParent) or using SQL, especially with FTS.

The Android-Remote dev in Java made the choice of SQL with a tweak of the database to create a virtual FTS3 table and adding indexes. (cf here)

I'm hesitating to load everything in a tree Model but I'm wondering what will be the perf. The app will run on both desktop and mobiles.
In term of volume, I imagine, it could be between 1k and 100k entries. (I've 40k)
Let's say we got 100 chars per entry, it mean 100kB*200 = 20MB of memory which is fine.
I'm more concerned on the search part using a regexp on different roles for every entries...
What do you think?

I've never looked into FTS algorithm and performance but the numbers I've seen are quite impressive compare to the regular search using a LIKE. The problem is that it is looking for complete words and I'd rather not have this limitation.
But still, would QSortFilterProxyModel be faster than SQLite (or another DB) for text search?
Is there some benchmarks about this?
How many entries can support QSortFilterProxyModel? After which number you would stop using it and turn to SQL DB?

JonB

@mbruel
I don't use SQLite. But I used Full Text Search on MS SQL Server. And, as you'd expect, it was about a squillion times faster than a non-FTS LIKE predicate. Of course, it had to maintain FTS tables for every word in the data. T-SQL CONTAINS() can search for left-hand side match (CONTAINS (column, '"text*"')), I don't know about SQLite.

mbruel

@JonB and after what point (number of items, size of the text) would you recommand SQL rather QSortFilterProxyModel?
Are there benchmarks on the subject?
Something to have in mind, is that if I use SQL, I need to repopulate the model each time I do a search. Does it worth it?

JonB

@mbruel
Nobody can answer what the performance would be like of SQLite, with or without FTS, compared to Qt code without trying in desired situation. It's like asking how long is a piece of string.

I don't actually think your issue is necessarily FTS vs LIKE. More to me is: I wouldn't read 40,000 records into memory and then filter them. Admittedly I use remote SQL servers where this is very significant, I don't know how that compares to SQLite with a local file, but still I'd rather give the database the opportunity to do the filtering....

So, if I understand your situation correctly, I'd probably want to compose the SQL query with (multiple column, if that's what you want) LIKE statements, and see how it performs. Then if I were interested I'd try adding FTS (might be a bit of work) versus filterAcceptsRow(), and compare those to the LIKE to see how they compared.

mbruel

@JonB
well it seems to me we could have some general figures on the performance of QSortFilterProxyModel filtering 10k, 100k, 1M entries using a regexp text search and compare that to a database (with ot without FTS). Or at least some feedback from people having experienced both on a given situation.

Having 40k or even 100k records in memory doesn't really scares me. It would only be loaded once and could then be faster for filtering but that is my question basically...
I've done it on professional projects and it was reacting quite fast but it was on decent stations... my concern is that I may target mobile phones or RPI.

It's either I go using SQL and the local SQLite db I've got (with or without FTS) OR I use a QSortFilterProxyModel. The design will be completely different. Using SQL, I'll just a need a regular QAbstractItemModel that I will clear and repopulate each time there is a search. Could this process be less smooth in term of user experience?

JonB

@mbruel said in QSortFilterProxyModel vs SQLite (with FTS3)? in term of performance and memory usage:

well it seems to me we could have some general figures on the performance of QSortFilterProxyModel filtering 10k, 100k, 1M entries using a regexp text search and compare that to a database (with ot without FTS). Or at least some feedback from people having experienced both on a given situation.

Best of luck for the feedback.

It's either I go using SQL and the local SQLite db I've got (with or without FTS) OR I use a QSortFilterProxyModel. The design will be completely different. Using SQL, I'll just a need a regular QAbstractItemModel that I will clear and repopulate each time there is a search

I don't understand. In both cases you'll want a SQL model, like QSqlQuery/TableModel. Then you may or may not want to place a QSortFilterProxyModel on top of it. In neither case will you have (just) "a regular QAbstractItemModel".

mbruel

@JonB

Best of luck for the feedback.

haha thanks. I would have imagined that Qt could provide such things to kind of "benchmark" their Proxy mechanism... or that some people would have done it to see how it reacts under heavy load.

I don't understand. In both cases you'll want a SQL model, like QSqlQuery/TableModel. Then you may or may not want to place a QSortFilterProxyModel on top of it. In neither case will you have (just) "a regular QAbstractItemModel".

In the case I use a Proxy, I was thinking populating by hand ONCE a QAbstractItemModel from the whole table by iterating a single QSqlQuery.

In the case I go SQL, I imagine to have ONE QAbstractItemModel that is used by my QML View. And I would repopulate it each time I do search query. (same way, iterating through the QSqlQuery).
Do you think it is best to use QSqlQueryModel and change the source of the View? I'm not sure how to do that, from now I'm just accessing C++ from QML, I've not tried yet to modify QML from C++. I suppose it is possible.

mbruel

I forgot to say, in case I go with a Proxy and loading by hand a QAbstractItemModel, it's because I want to build a tree instead of keeping a flat model. (having my songs under their albums under their artists)
I've read we could use a Proxy to map a flat model to a tree one by implementing mapToSource and mapFromSource but it sounds hell more complicated than building the tree. Maybe I'm wrong.. don't know, I'd need to see some examples...

In the case of SQL, I might just keep a flat model using sequential views (first Artists, then Albums, then tracks) not sure...

Christian Ehrlicher

@mbruel said in QSortFilterProxyModel vs SQLite (with FTS3)? in term of performance and memory usage:

haha thanks. I would have imagined that Qt could provide such things to kind of "benchmark" their Proxy mechanism... or that some people would have done it to see how it reacts under heavy load.

It all depends on the data - every benchmark will be wrong for any other usecase.

mbruel

@Christian-Ehrlicher said in QSortFilterProxyModel vs SQLite (with FTS3)? in term of performance and memory usage:

It all depends on the data - every benchmark will be wrong for any other usecase.

yes that's true, but it could give a general idea and maybe warn about bottlenecks... like 5% match (text search) on X records (10k, 100k, 1M) of text length 1000 chars for example.

Well, as I'm learning QML, I've chosen the Proxy way using a self populated tree Model.
It's working great, even on an old iPhone SE :)

I lost most of the time to tweak the deprecated TreeView from Controls 1.4... what a shame it is not in the v2 oO I hope it will be back with Qt6... It is definitely needed even if it is an hassle to understand how to render it (especially due to the necessary TableViewColumn....)

For those interested, I filter up and down the tree ProxyModel for each row in order to make the filtering decision. It works smoothly with 40k records.
Here is the filtering method.

@JonB and here is the way I populate the model from SQL. with the request being:

const QString ClementineRemote::sLibrarySQL =
        "select artist, album, title, track, filename from songs order by artist, album, track, title";

fcarney

Have you played with QSqlTableModel? It is based upon the abstract table model. I am going to be using this in a project and am curious if you compared its speed.

mbruel

@fcarney no I've not tried QSqlTableModel. As explained I wanted a Tree not a table.
alt text

Christian Ehrlicher

@mbruel said in QSortFilterProxyModel vs SQLite (with FTS3)? in term of performance and memory usage:

ike 5% match (text search) on X records (10k, 100k, 1M) of text length 1000 chars for example.

How should this be possible? It depends on your implementation and your data...

mbruel

@Christian-Ehrlicher it should only depend on the data. the idea would be to produce a data set of N records with length around a determined number of chars l.
Then you provide texts to search that would match an exact percentage p.
you can do that using whatever DB. Normally if you export the data set in another one or in whatever program, the output should be the same. You can thus compare the execution time. Does it make sense?

Christian Ehrlicher

@mbruel said in QSortFilterProxyModel vs SQLite (with FTS3)? in term of performance and memory usage:

Does it make sense?

Not really - how to you compare texts is very important, also if you don't compare text but numbers and so on.
You can write a testcase on your own to check it.

mbruel

@Christian-Ehrlicher the idea would have been to test comparison implementation on text only, for example a LIKE "%<text>%"
but I see what you mean.
I guess this kind of testing is done by companies to decide which tool/implementation to use on products where performance is critical. Myself, I don't have this need...

JonB

@mbruel said in QSortFilterProxyModel vs SQLite (with FTS3)? in term of performance and memory usage:

LIKE "%<text>%"

Nothing to do with your proposed timing tests. Just want to make sure you are aware: there will (should) be an enormous time difference between LIKE "%<text>%" versus LIKE "<text>%".

mbruel

@JonB

Nothing to do with your proposed timing tests

maybe I wasn't clear but that's what I'm asking since the beginning. my initial goal was to use a QSortFilterProxyModel and set its regexp (cf here), so I was planing to do a LIKE "%<text>%". In fact even worst for the perf as in Qt I can do it case insensitive.

My concern was to know how QSortFilterProxyModel handle a big/huge number of records. I'm testing with 40k, it is loading and filtering in few hundreds milliseconds. What about with 100k or 1M records?
That is why I was considering using SQL instead. Why not FTS even, as I said since the beginning, if it would limit the search to full word or like you told me a "search for left-hand side match".

So basically I was interested in any kind of benchmarks for searching either a regexp, a full word or a left-hand side match. Benchmark/Feedback on QSortFilterProxyModel and why not a comparison with SQLite or another DB.

Edit: as you could imagine from the picture above, my textfield trigger a search each time the text changes (for each char :))
Edit2: and yes I'm aware of the difference, I was just willing to accept limiting the search if it was lagging too much