Best strategy to process large Qsqlquery

JonB

@Burrito
I'm working off the Basic PIVOT Example on https://docs.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15. I don't have MS SQL to hand, no testing, so this is brainstorm. You'll have to play with it. (Obviously, what you want to do first is practice the query in MS SQL Server Management Studio, assuming you have access.)

I'm thinking something like:

SELECT [Date],   
[Data1], [Data2], [Data3], [Data4]  
FROM  
(SELECT [Date], [DataType], [DataValue]
    FROM [table]) AS SourceTable  
PIVOT  
(  
ANY([Date]) , ANY([DataValue]) 
FOR [DataType] IN ([1], [2], [3], [4])  
) AS PivotTable
ORDER BY PivotTable.[Date];

The bit I'm not sure about is where I've written

ANY([Date]) , ANY([DataValue])

That might be right. It might be without the ANY() around around the column names.

The point is that example/PIVOT usually aggregates values where there are multiple ones, e.g. AVG() in the example. In our case we know the values are unique for Date + DataValue, but it does not require that. I am thinking ANY() will keep it happy, but I'm not sure.

There is doubtless an example for this case somewhere out there, e.g. on stackoverflow, if you get stuck.

Obviously, please let me know, I'm interested! If & when you get it working it's going to go like a rocket, hopefully more like your your original 40 seconds than 40 minutes :)

Otherwise, thinking aloud, if we fail miserably to get this PIVOT working we can always still write your query with a CASE and a JOIN and a GROUP BY, which must be what the PIVOT is doing....

kshegunov

@Burrito said in Best strategy to process large Qsqlquery:

So I want to create a row for each Date value (as one measurement point) and sort all available values into specific columns according to their DataType

I have a question. Why do you want to do this to begin with? Put an index spanning Date and DataType (i.e primary key) and an index on DataType and just operate directly would be my go to. I'm missing the point of this row aggregation, I guess, could you explain?

Burrito

@kshegunov The raw data tables are automatically created by the measurement software (which I can´t modify) and there is no unique key or primary key in that tables. Therefore I need to perform something like "Select * from table where devicechannel = X and DateTime between x and y" to receive the wanted data. I need to specify the time range I want to select as Devicechannel is not a unique value. I´m not sure if I understand you right, but as I understand it what you suggested is not possible with this table?

@JonB Thank you for the code to start with! I had to slightly modify it as the SSMS gave me syntax errors.

This one works fine in SSMS:

SELECT * 
FROM  
(SELECT [DateTime], [DataType], [DataValue]
    FROM Database.Table) AS SourceTable  
PIVOT  
(  
	MAX(DataValue)
	FOR DataType 
	IN ([1], [2], [3], [4])  
) AS PivotTable
ORDER BY PivotTable.[DateTime];

For anyone interested and looking for an explaination:
What I had to do is use MAX (or MIN, doesnt matter) as aggregate function. As you pointed out, every combination of DateTime and DataType is unique -> there is always only one MAX value of DataValue per DateTime and DataType.

The DateTime argument is not needed in the PIVOT statement, it just needs to be selected in the select statement of the initial query and is automatically assigned to a column of the PIVOT table.

What I still wonder about is the performance difference between the query in SSMS and my QT application:
SSMS: 28 secs
Query with QT (without iteration over all record elements): 257 secs
Query with QT and pushing all data into a vector: 260 secs

It seems the vector.push_back is not the problem as it does not significantly increase the required time, however, there is a factor of ~10 between SSMS and the QT query.

Anyone an idea how to overcome this?

JonB

@Burrito
Thanks for posting, and adjusting as required from mine. Glad you have this working!

Timings. Let's be clear, you are running both your Qt program & SSMS from the same machine? And the SQL Server is local to that machine, or remote? Just how many rows does your pivoted query return, approximately? And when you say "Query with QT (without iteration over all record elements)" what exactly are you doing in your Qt code to execute the query? Is it exactly as shown originally, with what removed, or do you not even do the query.next(), or... ?

Burrito

@JonB

The QT program and SSMS are running on the same machine, the SQLEXPRESS server app is running on a linux machine on the network. The query and the pivoted result is approx 240k instead of 2.5 mio. inital rows.

I just realised that my "query without interation" still iterated over all QSqlRecord records - just didn´t do anything with it. Running just the empty while(query.next() loop148 secs. So I already "halfed" my query time by removing QSqlRecord, however, using the following code without QSqlRecord:

while(query.next())
{
     rawResult singleresult;

     singleresult.Date= query.value(0).toLongLong();
     singleresult.Data1= query.value(1).toDouble();
     singleresult.Data2 = query.value(2).toDouble();
     singleresult.Data3 = query.value(3).toDouble();
     singleresult.Data4 = query.value(4).toDouble();
     singleresult.Data5= query.value(5).toDouble();
     singleresult.Data6 = query.value(6).toDouble();
     singleresult.Data7 = query.value(7).toDouble();
     singleresult.Data8 = query.value(8).toDouble();

     rawresults.push_back(singleresult);
}

The time is 270 secs again. Just executing the query.exex(qry3) needs 23 secs just like SSMS.

JonB

@Burrito
Hmm. Where to guess....

The driver used by SSMS is, I think, some kind of "SQL native query" driver, I think the one used by Qt may be different. You specify QODBC https://doc.qt.io/qt-5/sql-driver.html#qodbc to QSqlDatabase::addDatabase, right?

https://doc.qt.io/qt-5/sql-driver.html#qodbc-for-open-database-connectivity-odbc

Note: You should use the native driver, if it is available, instead of the ODBC driver. ODBC support can be used as a fallback for compliant databases if no native driver is available.

Unlike for, say, MySQL you have no choice from Qt for MS SQL. There may be an overhead there.

I need you test/confirm what happens when you do just:

while(query.next())
{
}

Is that, with no query.value() calls, your "needs 23 secs just like SSMS`.

In which case, pulling the values out of the rows is costly. You are doing it via a non-native ODBC driver, SSMS is doing it via a native SQL client one. There are are 240k * 9 extract/converts to do. Comment out half the query.value() lines, does the time halve?

You may as well also set https://doc.qt.io/qt-5/qsqlquery.html#setForwardOnly. From what you say that does not seem to be the issue, but put it in anyway.

Burrito

@JonB Thank you !!

Your tip with setForwardOnly reduced the required time from 234secs to 24 secs and I am quite happy with that!

JonB

@Burrito
Wow, that is interesting to hear!

I didn't expect that much difference. You can afford to setForwardOnly() for nearly every query you do, it's rare circumstances that you need to navigate back through the result set. I wonder how many people are paying that penalty with the default being false...

Christian Ehrlicher

@JonB said in Best strategy to process large Qsqlquery:

with the default being false...

It's not the case with e.g. postgres - so yes it's db dependent and therefore every default value you choose can be wrong for your specific use case.

kshegunov

@JonB said in Best strategy to process large Qsqlquery:

I didn't expect that much difference. You can afford to setForwardOnly() for nearly every query you do, it's rare circumstances that you need to navigate back through the result set. I wonder how many people are paying that penalty with the default being false...

Also for relatively small to medium datasets it isn't a penalty, which is why many database drivers prefer buffered (i.e. random-access) queries to the unbuffered (i.e. forward only).