Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Best strategy to process large Qsqlquery
Forum Updated to NodeBB v4.3 + New Features

Best strategy to process large Qsqlquery

Scheduled Pinned Locked Moved Solved General and Desktop
24 Posts 6 Posters 3.9k Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B Offline
    B Offline
    Burrito
    wrote on last edited by
    #1

    Hey,

    I am trying to develop a tool to process raw data which I receive from a MS SQL database and store the processed (sorted) data, to avoid repeated query of the same data as the results can contain up to a few million rows.

    The raw data received consists of the fields [Date] as ticks (BIG INT), [Data type] as INT and [Data value] as REAL. Thereby, it is possible to receive multiple rows of [Data type] and [Data value] for the same [Date].

    What i basically want is to receive the data, sort it into a vector of the struct for further processing:

    struct raw
    {
       long long int t_datetime;
       int t_type;
       double t_value;
    };
    
    

    I wrote the following code to query the rawdata:

    QSqlDatabase serverdb;                  
    
     //Connect database server via serverid;
    connectServerByID(serverdb, serverid); // Looks up existing connection or creates new one and performs db.open();
    
    //Create rawresult vector for 
    std::vector<raw> rawresults;
    
    // Create qry3 string from query-generator getConnectionStringRaw
    QString qry3 = getConnectionStringRaw(databases,starttime,endtime);
    qDebug() << "Generated query: " << qry3 << endl;
    QSqlQuery query(serverdb);
    qDebug() << "starting query" << endl;
    
    query.exec(qry3);
       while(query.next())
       {
               QSqlRecord rec = query.record();
    
               for(int i=0; i<rec.count(); ++i)
               {
                      raw datapoint;
    
                      datapoint.t_datetime = query.value(0).toLongLong();
                      datapoint.t_type = query.value(1).toInt();
                      datapoint.t_value = query.value(2).toDouble();
                      rawresults.push_back(datapoint);
               }
       }
       qDebug() << "result count: " << rawresults.size() << endl; 
    

    In general the code works. The complete loop over a query with 2.4 mio result rows without sorting takes around 40 secs, which is a perfectly acceptable time for me.
    Unfortunately, the moment i try to add the received data into the vector via push_back or sort them via a switch statement directly into the final structure, the process takes around 45 minutes.

    It seems the way I try to process the date is very inefficient, so I would be glad to receive advice for a more elegant solution.

    Thanks you :)

    jsulmJ 1 Reply Last reply
    0
    • B Burrito

      Hey,

      I am trying to develop a tool to process raw data which I receive from a MS SQL database and store the processed (sorted) data, to avoid repeated query of the same data as the results can contain up to a few million rows.

      The raw data received consists of the fields [Date] as ticks (BIG INT), [Data type] as INT and [Data value] as REAL. Thereby, it is possible to receive multiple rows of [Data type] and [Data value] for the same [Date].

      What i basically want is to receive the data, sort it into a vector of the struct for further processing:

      struct raw
      {
         long long int t_datetime;
         int t_type;
         double t_value;
      };
      
      

      I wrote the following code to query the rawdata:

      QSqlDatabase serverdb;                  
      
       //Connect database server via serverid;
      connectServerByID(serverdb, serverid); // Looks up existing connection or creates new one and performs db.open();
      
      //Create rawresult vector for 
      std::vector<raw> rawresults;
      
      // Create qry3 string from query-generator getConnectionStringRaw
      QString qry3 = getConnectionStringRaw(databases,starttime,endtime);
      qDebug() << "Generated query: " << qry3 << endl;
      QSqlQuery query(serverdb);
      qDebug() << "starting query" << endl;
      
      query.exec(qry3);
         while(query.next())
         {
                 QSqlRecord rec = query.record();
      
                 for(int i=0; i<rec.count(); ++i)
                 {
                        raw datapoint;
      
                        datapoint.t_datetime = query.value(0).toLongLong();
                        datapoint.t_type = query.value(1).toInt();
                        datapoint.t_value = query.value(2).toDouble();
                        rawresults.push_back(datapoint);
                 }
         }
         qDebug() << "result count: " << rawresults.size() << endl; 
      

      In general the code works. The complete loop over a query with 2.4 mio result rows without sorting takes around 40 secs, which is a perfectly acceptable time for me.
      Unfortunately, the moment i try to add the received data into the vector via push_back or sort them via a switch statement directly into the final structure, the process takes around 45 minutes.

      It seems the way I try to process the date is very inefficient, so I would be glad to receive advice for a more elegant solution.

      Thanks you :)

      jsulmJ Offline
      jsulmJ Offline
      jsulm
      Lifetime Qt Champion
      wrote on last edited by
      #2

      @Burrito said in Best strategy to process large Qsqlquery:

      The complete loop over a query with 2.4 mio result rows without sorting

      If you really have to fetch such amount of data and put it into a vector (not sure what the use case is) you should do it in a thread.
      And sorting is better done on SQL server side (ORDER BY).

      https://forum.qt.io/topic/113070/qt-code-of-conduct

      Pablo J. RoginaP B 2 Replies Last reply
      4
      • jsulmJ jsulm

        @Burrito said in Best strategy to process large Qsqlquery:

        The complete loop over a query with 2.4 mio result rows without sorting

        If you really have to fetch such amount of data and put it into a vector (not sure what the use case is) you should do it in a thread.
        And sorting is better done on SQL server side (ORDER BY).

        Pablo J. RoginaP Offline
        Pablo J. RoginaP Offline
        Pablo J. Rogina
        wrote on last edited by
        #3

        @jsulm said in Best strategy to process large Qsqlquery:

        And sorting is better done on SQL server side (ORDER BY).

        Yes, and you can also "avoid repeated query of the same data" by using SQL DISTINCT statement.

        Upvote the answer(s) that helped you solve the issue
        Use "Topic Tools" button to mark your post as Solved
        Add screenshots via postimage.org
        Don't ask support requests via chat/PM. Please use the forum so others can benefit from the solution in the future

        1 Reply Last reply
        4
        • jsulmJ jsulm

          @Burrito said in Best strategy to process large Qsqlquery:

          The complete loop over a query with 2.4 mio result rows without sorting

          If you really have to fetch such amount of data and put it into a vector (not sure what the use case is) you should do it in a thread.
          And sorting is better done on SQL server side (ORDER BY).

          B Offline
          B Offline
          Burrito
          wrote on last edited by
          #4

          @jsulm I already sort the the data using ORDER BY Date ASC;

          What I what as the end result is to put all received data into a local sqlite table of form:

          Date | ValueOfDataType1|ValueOfDataType2|ValueOfDataType3|ValueOfDataType4|....|
          

          as I want to perform different calculations and plottings of the columns later on.

          If I understand you right you suggest to split the query into x seperate queries, where I define e.g.
          WHERE DataType=1 and later on combine all of the queries?

          jsulmJ JonBJ 2 Replies Last reply
          0
          • B Burrito

            @jsulm I already sort the the data using ORDER BY Date ASC;

            What I what as the end result is to put all received data into a local sqlite table of form:

            Date | ValueOfDataType1|ValueOfDataType2|ValueOfDataType3|ValueOfDataType4|....|
            

            as I want to perform different calculations and plottings of the columns later on.

            If I understand you right you suggest to split the query into x seperate queries, where I define e.g.
            WHERE DataType=1 and later on combine all of the queries?

            jsulmJ Offline
            jsulmJ Offline
            jsulm
            Lifetime Qt Champion
            wrote on last edited by
            #5

            @Burrito said in Best strategy to process large Qsqlquery:

            you suggest to split the query into x seperate queries

            No
            I only suggested to do this long lasting stuff in a thread to not to block your main thread.

            https://forum.qt.io/topic/113070/qt-code-of-conduct

            1 Reply Last reply
            1
            • B Burrito

              @jsulm I already sort the the data using ORDER BY Date ASC;

              What I what as the end result is to put all received data into a local sqlite table of form:

              Date | ValueOfDataType1|ValueOfDataType2|ValueOfDataType3|ValueOfDataType4|....|
              

              as I want to perform different calculations and plottings of the columns later on.

              If I understand you right you suggest to split the query into x seperate queries, where I define e.g.
              WHERE DataType=1 and later on combine all of the queries?

              JonBJ Offline
              JonBJ Offline
              JonB
              wrote on last edited by JonB
              #6

              @Burrito
              If you are saying you want to eliminate repeated occurences of rows with same date you want to do that at SQL Server side. Hopefully with @Pablo-J-Rogina 's SELECT DISTINCT. You really do not want to be sorting that many rows in your client Qt code, as you have discovered.

              But I'm not sure that is what you're saying.

              Date | ValueOfDataType1|ValueOfDataType2|ValueOfDataType3|ValueOfDataType4|....|

              Explain what you mean here. Are all these values being stored as (a fixed number of) columns in your SQLite? Or is it a string?? Or what?

              It looks like you want to do some coalescing/processing of your data rows? If it were me, and I cared about speed, I would do all of that in a SQL query up at the SQL server, wildly more efficient both in speed and output volume. I don't know how much work you want to put into the SQL Server side code vs in your Qt client.

              B 1 Reply Last reply
              1
              • JonBJ JonB

                @Burrito
                If you are saying you want to eliminate repeated occurences of rows with same date you want to do that at SQL Server side. Hopefully with @Pablo-J-Rogina 's SELECT DISTINCT. You really do not want to be sorting that many rows in your client Qt code, as you have discovered.

                But I'm not sure that is what you're saying.

                Date | ValueOfDataType1|ValueOfDataType2|ValueOfDataType3|ValueOfDataType4|....|

                Explain what you mean here. Are all these values being stored as (a fixed number of) columns in your SQLite? Or is it a string?? Or what?

                It looks like you want to do some coalescing/processing of your data rows? If it were me, and I cared about speed, I would do all of that in a SQL query up at the SQL server, wildly more efficient both in speed and output volume. I don't know how much work you want to put into the SQL Server side code vs in your Qt client.

                B Offline
                B Offline
                Burrito
                wrote on last edited by
                #7

                @JonB Maybe I didn´t make that clear enough!

                The data I want to process is coming from a scientific instrument, so I can obtain multiple parameters per date point, but only data_value type 1 is always present.

                The tables typically look like this:

                Date |  DataType |  DataValue
                ----------------------------
                15535268742988000| 1 | 0.136423
                15535268742988000| 2 | 12.151334
                15535268742988000| 3 | 0.423564
                15535268742988000| 4 | 0.001244
                15535268743538000| 1 | 0.215352
                15535268744538000| 1 | 1.134652
                15535268744538000| 2 | 3.123412
                15535268842988000| 1 | 0.113451
                15535268842988000| 2 | 18.151334
                15535268842988000| 3 | 1.456564
                15535268842988000| 4 | 1.001244
                

                So I want to create a row for each Date value (as one measurement point) and sort all available values into specific columns according to their DataType.

                The result for the example should look like this:

                Date | Data1 | Data2 | Data3 | Data4
                -----------------------------------
                15535268742988000 | 0.136423 | 12.151334 | 0.423564 | 0.001244
                15535268743538000 | 0.215352 |                      |             
                15535268744538000 | 1.134652 | 3.123412  |          |
                15535268842988000 | 0.113451 | 18.151334 | 1.456564 | 1.001244
                
                jsulmJ JonBJ kshegunovK 3 Replies Last reply
                0
                • B Burrito

                  @JonB Maybe I didn´t make that clear enough!

                  The data I want to process is coming from a scientific instrument, so I can obtain multiple parameters per date point, but only data_value type 1 is always present.

                  The tables typically look like this:

                  Date |  DataType |  DataValue
                  ----------------------------
                  15535268742988000| 1 | 0.136423
                  15535268742988000| 2 | 12.151334
                  15535268742988000| 3 | 0.423564
                  15535268742988000| 4 | 0.001244
                  15535268743538000| 1 | 0.215352
                  15535268744538000| 1 | 1.134652
                  15535268744538000| 2 | 3.123412
                  15535268842988000| 1 | 0.113451
                  15535268842988000| 2 | 18.151334
                  15535268842988000| 3 | 1.456564
                  15535268842988000| 4 | 1.001244
                  

                  So I want to create a row for each Date value (as one measurement point) and sort all available values into specific columns according to their DataType.

                  The result for the example should look like this:

                  Date | Data1 | Data2 | Data3 | Data4
                  -----------------------------------
                  15535268742988000 | 0.136423 | 12.151334 | 0.423564 | 0.001244
                  15535268743538000 | 0.215352 |                      |             
                  15535268744538000 | 1.134652 | 3.123412  |          |
                  15535268842988000 | 0.113451 | 18.151334 | 1.456564 | 1.001244
                  
                  jsulmJ Offline
                  jsulmJ Offline
                  jsulm
                  Lifetime Qt Champion
                  wrote on last edited by
                  #8

                  @Burrito Why do you want to do this actually? Wouldn't this mean a variable number of columns?

                  https://forum.qt.io/topic/113070/qt-code-of-conduct

                  1 Reply Last reply
                  1
                  • B Burrito

                    @JonB Maybe I didn´t make that clear enough!

                    The data I want to process is coming from a scientific instrument, so I can obtain multiple parameters per date point, but only data_value type 1 is always present.

                    The tables typically look like this:

                    Date |  DataType |  DataValue
                    ----------------------------
                    15535268742988000| 1 | 0.136423
                    15535268742988000| 2 | 12.151334
                    15535268742988000| 3 | 0.423564
                    15535268742988000| 4 | 0.001244
                    15535268743538000| 1 | 0.215352
                    15535268744538000| 1 | 1.134652
                    15535268744538000| 2 | 3.123412
                    15535268842988000| 1 | 0.113451
                    15535268842988000| 2 | 18.151334
                    15535268842988000| 3 | 1.456564
                    15535268842988000| 4 | 1.001244
                    

                    So I want to create a row for each Date value (as one measurement point) and sort all available values into specific columns according to their DataType.

                    The result for the example should look like this:

                    Date | Data1 | Data2 | Data3 | Data4
                    -----------------------------------
                    15535268742988000 | 0.136423 | 12.151334 | 0.423564 | 0.001244
                    15535268743538000 | 0.215352 |                      |             
                    15535268744538000 | 1.134652 | 3.123412  |          |
                    15535268842988000 | 0.113451 | 18.151334 | 1.456564 | 1.001244
                    
                    JonBJ Offline
                    JonBJ Offline
                    JonB
                    wrote on last edited by JonB
                    #9

                    @Burrito
                    Exactly as @jsulm just asked you? That is why you'll see I wrote

                    Are all these values being stored as (a fixed number of) columns in your SQLite?

                    It's not going to work if you don't know how many columns you want. Get this decided before we proceed!

                    I'm getting confused as to whether you're saying there are only every 4 columns?

                    And are you saying the combination Date + DataType is unique, i.e. only ever one (or zero) rows for that?

                    1 Reply Last reply
                    0
                    • B Offline
                      B Offline
                      Burrito
                      wrote on last edited by
                      #10

                      @JonB said in Best strategy to process large Qsqlquery:

                      . Are all these values being stored as (a fixed number of) columns in your SQLite? Or is it a string?? Or what?

                      @JonB & @jsulm Sorry, I didn´t see that :)
                      The number of columns is fixed to 9, there are only 8 different DataTypes plus the Date value.

                      I need all that data, because I want to create different plots from that e.g. Data1 vs. Data2 or Date vs. Date3, but they need to be processed (e.g. division by a chosen factor) for that, so I can´t use the raw data as received from the SQL server directly.

                      JonBJ 1 Reply Last reply
                      0
                      • B Burrito

                        @JonB said in Best strategy to process large Qsqlquery:

                        . Are all these values being stored as (a fixed number of) columns in your SQLite? Or is it a string?? Or what?

                        @JonB & @jsulm Sorry, I didn´t see that :)
                        The number of columns is fixed to 9, there are only 8 different DataTypes plus the Date value.

                        I need all that data, because I want to create different plots from that e.g. Data1 vs. Data2 or Date vs. Date3, but they need to be processed (e.g. division by a chosen factor) for that, so I can´t use the raw data as received from the SQL server directly.

                        JonBJ Offline
                        JonBJ Offline
                        JonB
                        wrote on last edited by JonB
                        #11

                        @Burrito
                        Your post has crossed with mine. I need to know:

                        And are you saying the combination Date + DataType is unique, i.e. only ever one (or zero) rows for that?

                        I also need to know if you are prepared to do some smart SQL Server code, or whether you only want to write stuff in your Qt client?

                        1 Reply Last reply
                        0
                        • B Offline
                          B Offline
                          Burrito
                          wrote on last edited by
                          #12

                          Sorry for the long response time, I can only answer every 10 min.

                          @JonB: Yes, the combination of Date + DateType is always unique.

                          I´m not very experienced with SQL apart from simple queries, but of course I´m interested if SQL Server Code is the optimum solution for such a process!

                          If you could give me a few key words or an example I would be very grateful to dive into the matter!

                          JonBJ 2 Replies Last reply
                          0
                          • B Burrito

                            Sorry for the long response time, I can only answer every 10 min.

                            @JonB: Yes, the combination of Date + DateType is always unique.

                            I´m not very experienced with SQL apart from simple queries, but of course I´m interested if SQL Server Code is the optimum solution for such a process!

                            If you could give me a few key words or an example I would be very grateful to dive into the matter!

                            JonBJ Offline
                            JonBJ Offline
                            JonB
                            wrote on last edited by JonB
                            #13

                            @Burrito
                            OK, at least I understand now :)

                            Then I believe/hope you are in luck! T-SQL has a PIVOT operator. If I understand right, it does exactly what you want, all up at the server, efficiently!

                            You have a rowset. It has a unique key of columns Date + DataType. What you want is to turn those rows into columns, where the columns are dictated by the value of DataType, with separate columns for each value in DataType.

                            https://docs.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15

                            You can use the PIVOT and UNPIVOT relational operators to change a table-valued expression into another table. PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output. And PIVOT runs aggregations where they're required on any remaining column values that are wanted in the final output. UNPIVOT carries out the opposite operation to PIVOT by rotating columns of a table-valued expression into column values.

                            https://www.sqlservertutorial.net/sql-server-basics/sql-server-pivot/ claims to explain it simply, but I'm not sure I find it much clearer... :)

                            I hope this corresponds to what you are trying to do. I have used it before. I'm trying to read up on it now... :)

                            B 1 Reply Last reply
                            1
                            • JonBJ JonB

                              @Burrito
                              OK, at least I understand now :)

                              Then I believe/hope you are in luck! T-SQL has a PIVOT operator. If I understand right, it does exactly what you want, all up at the server, efficiently!

                              You have a rowset. It has a unique key of columns Date + DataType. What you want is to turn those rows into columns, where the columns are dictated by the value of DataType, with separate columns for each value in DataType.

                              https://docs.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15

                              You can use the PIVOT and UNPIVOT relational operators to change a table-valued expression into another table. PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output. And PIVOT runs aggregations where they're required on any remaining column values that are wanted in the final output. UNPIVOT carries out the opposite operation to PIVOT by rotating columns of a table-valued expression into column values.

                              https://www.sqlservertutorial.net/sql-server-basics/sql-server-pivot/ claims to explain it simply, but I'm not sure I find it much clearer... :)

                              I hope this corresponds to what you are trying to do. I have used it before. I'm trying to read up on it now... :)

                              B Offline
                              B Offline
                              Burrito
                              wrote on last edited by
                              #14

                              @JonB Thank you very much, the two links are exactly what I was searching for. I will try around with it now =)

                              1 Reply Last reply
                              0
                              • B Burrito

                                Sorry for the long response time, I can only answer every 10 min.

                                @JonB: Yes, the combination of Date + DateType is always unique.

                                I´m not very experienced with SQL apart from simple queries, but of course I´m interested if SQL Server Code is the optimum solution for such a process!

                                If you could give me a few key words or an example I would be very grateful to dive into the matter!

                                JonBJ Offline
                                JonBJ Offline
                                JonB
                                wrote on last edited by JonB
                                #15

                                @Burrito
                                I'm working off the Basic PIVOT Example on https://docs.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver15. I don't have MS SQL to hand, no testing, so this is brainstorm. You'll have to play with it. (Obviously, what you want to do first is practice the query in MS SQL Server Management Studio, assuming you have access.)

                                I'm thinking something like:

                                SELECT [Date],   
                                [Data1], [Data2], [Data3], [Data4]  
                                FROM  
                                (SELECT [Date], [DataType], [DataValue]
                                    FROM [table]) AS SourceTable  
                                PIVOT  
                                (  
                                ANY([Date]) , ANY([DataValue]) 
                                FOR [DataType] IN ([1], [2], [3], [4])  
                                ) AS PivotTable
                                ORDER BY PivotTable.[Date];
                                

                                The bit I'm not sure about is where I've written

                                ANY([Date]) , ANY([DataValue]) 
                                

                                That might be right. It might be without the ANY() around around the column names.

                                The point is that example/PIVOT usually aggregates values where there are multiple ones, e.g. AVG() in the example. In our case we know the values are unique for Date + DataValue, but it does not require that. I am thinking ANY() will keep it happy, but I'm not sure.

                                There is doubtless an example for this case somewhere out there, e.g. on stackoverflow, if you get stuck.

                                Obviously, please let me know, I'm interested! If & when you get it working it's going to go like a rocket, hopefully more like your your original 40 seconds than 40 minutes :)

                                Otherwise, thinking aloud, if we fail miserably to get this PIVOT working we can always still write your query with a CASE and a JOIN and a GROUP BY, which must be what the PIVOT is doing....

                                1 Reply Last reply
                                1
                                • B Burrito

                                  @JonB Maybe I didn´t make that clear enough!

                                  The data I want to process is coming from a scientific instrument, so I can obtain multiple parameters per date point, but only data_value type 1 is always present.

                                  The tables typically look like this:

                                  Date |  DataType |  DataValue
                                  ----------------------------
                                  15535268742988000| 1 | 0.136423
                                  15535268742988000| 2 | 12.151334
                                  15535268742988000| 3 | 0.423564
                                  15535268742988000| 4 | 0.001244
                                  15535268743538000| 1 | 0.215352
                                  15535268744538000| 1 | 1.134652
                                  15535268744538000| 2 | 3.123412
                                  15535268842988000| 1 | 0.113451
                                  15535268842988000| 2 | 18.151334
                                  15535268842988000| 3 | 1.456564
                                  15535268842988000| 4 | 1.001244
                                  

                                  So I want to create a row for each Date value (as one measurement point) and sort all available values into specific columns according to their DataType.

                                  The result for the example should look like this:

                                  Date | Data1 | Data2 | Data3 | Data4
                                  -----------------------------------
                                  15535268742988000 | 0.136423 | 12.151334 | 0.423564 | 0.001244
                                  15535268743538000 | 0.215352 |                      |             
                                  15535268744538000 | 1.134652 | 3.123412  |          |
                                  15535268842988000 | 0.113451 | 18.151334 | 1.456564 | 1.001244
                                  
                                  kshegunovK Offline
                                  kshegunovK Offline
                                  kshegunov
                                  Moderators
                                  wrote on last edited by
                                  #16

                                  @Burrito said in Best strategy to process large Qsqlquery:

                                  So I want to create a row for each Date value (as one measurement point) and sort all available values into specific columns according to their DataType

                                  I have a question. Why do you want to do this to begin with? Put an index spanning Date and DataType (i.e primary key) and an index on DataType and just operate directly would be my go to. I'm missing the point of this row aggregation, I guess, could you explain?

                                  Read and abide by the Qt Code of Conduct

                                  1 Reply Last reply
                                  1
                                  • B Offline
                                    B Offline
                                    Burrito
                                    wrote on last edited by Burrito
                                    #17

                                    @kshegunov The raw data tables are automatically created by the measurement software (which I can´t modify) and there is no unique key or primary key in that tables. Therefore I need to perform something like "Select * from table where devicechannel = X and DateTime between x and y" to receive the wanted data. I need to specify the time range I want to select as Devicechannel is not a unique value. I´m not sure if I understand you right, but as I understand it what you suggested is not possible with this table?

                                    @JonB Thank you for the code to start with! I had to slightly modify it as the SSMS gave me syntax errors.

                                    This one works fine in SSMS:

                                    SELECT * 
                                    FROM  
                                    (SELECT [DateTime], [DataType], [DataValue]
                                        FROM Database.Table) AS SourceTable  
                                    PIVOT  
                                    (  
                                    	MAX(DataValue)
                                    	FOR DataType 
                                    	IN ([1], [2], [3], [4])  
                                    ) AS PivotTable
                                    ORDER BY PivotTable.[DateTime];
                                    

                                    For anyone interested and looking for an explaination:
                                    What I had to do is use MAX (or MIN, doesnt matter) as aggregate function. As you pointed out, every combination of DateTime and DataType is unique -> there is always only one MAX value of DataValue per DateTime and DataType.

                                    The DateTime argument is not needed in the PIVOT statement, it just needs to be selected in the select statement of the initial query and is automatically assigned to a column of the PIVOT table.

                                    What I still wonder about is the performance difference between the query in SSMS and my QT application:
                                    SSMS: 28 secs
                                    Query with QT (without iteration over all record elements): 257 secs
                                    Query with QT and pushing all data into a vector: 260 secs

                                    It seems the vector.push_back is not the problem as it does not significantly increase the required time, however, there is a factor of ~10 between SSMS and the QT query.

                                    Anyone an idea how to overcome this?

                                    JonBJ 1 Reply Last reply
                                    1
                                    • B Burrito

                                      @kshegunov The raw data tables are automatically created by the measurement software (which I can´t modify) and there is no unique key or primary key in that tables. Therefore I need to perform something like "Select * from table where devicechannel = X and DateTime between x and y" to receive the wanted data. I need to specify the time range I want to select as Devicechannel is not a unique value. I´m not sure if I understand you right, but as I understand it what you suggested is not possible with this table?

                                      @JonB Thank you for the code to start with! I had to slightly modify it as the SSMS gave me syntax errors.

                                      This one works fine in SSMS:

                                      SELECT * 
                                      FROM  
                                      (SELECT [DateTime], [DataType], [DataValue]
                                          FROM Database.Table) AS SourceTable  
                                      PIVOT  
                                      (  
                                      	MAX(DataValue)
                                      	FOR DataType 
                                      	IN ([1], [2], [3], [4])  
                                      ) AS PivotTable
                                      ORDER BY PivotTable.[DateTime];
                                      

                                      For anyone interested and looking for an explaination:
                                      What I had to do is use MAX (or MIN, doesnt matter) as aggregate function. As you pointed out, every combination of DateTime and DataType is unique -> there is always only one MAX value of DataValue per DateTime and DataType.

                                      The DateTime argument is not needed in the PIVOT statement, it just needs to be selected in the select statement of the initial query and is automatically assigned to a column of the PIVOT table.

                                      What I still wonder about is the performance difference between the query in SSMS and my QT application:
                                      SSMS: 28 secs
                                      Query with QT (without iteration over all record elements): 257 secs
                                      Query with QT and pushing all data into a vector: 260 secs

                                      It seems the vector.push_back is not the problem as it does not significantly increase the required time, however, there is a factor of ~10 between SSMS and the QT query.

                                      Anyone an idea how to overcome this?

                                      JonBJ Offline
                                      JonBJ Offline
                                      JonB
                                      wrote on last edited by JonB
                                      #18

                                      @Burrito
                                      Thanks for posting, and adjusting as required from mine. Glad you have this working!

                                      Timings. Let's be clear, you are running both your Qt program & SSMS from the same machine? And the SQL Server is local to that machine, or remote? Just how many rows does your pivoted query return, approximately? And when you say "Query with QT (without iteration over all record elements)" what exactly are you doing in your Qt code to execute the query? Is it exactly as shown originally, with what removed, or do you not even do the query.next(), or... ?

                                      1 Reply Last reply
                                      0
                                      • B Offline
                                        B Offline
                                        Burrito
                                        wrote on last edited by
                                        #19

                                        @JonB

                                        The QT program and SSMS are running on the same machine, the SQLEXPRESS server app is running on a linux machine on the network. The query and the pivoted result is approx 240k instead of 2.5 mio. inital rows.

                                        I just realised that my "query without interation" still iterated over all QSqlRecord records - just didn´t do anything with it. Running just the empty while(query.next() loop148 secs. So I already "halfed" my query time by removing QSqlRecord, however, using the following code without QSqlRecord:

                                        while(query.next())
                                        {
                                             rawResult singleresult;
                                        
                                             singleresult.Date= query.value(0).toLongLong();
                                             singleresult.Data1= query.value(1).toDouble();
                                             singleresult.Data2 = query.value(2).toDouble();
                                             singleresult.Data3 = query.value(3).toDouble();
                                             singleresult.Data4 = query.value(4).toDouble();
                                             singleresult.Data5= query.value(5).toDouble();
                                             singleresult.Data6 = query.value(6).toDouble();
                                             singleresult.Data7 = query.value(7).toDouble();
                                             singleresult.Data8 = query.value(8).toDouble();
                                        
                                             rawresults.push_back(singleresult);
                                        }
                                        

                                        The time is 270 secs again. Just executing the query.exex(qry3) needs 23 secs just like SSMS.

                                        JonBJ 1 Reply Last reply
                                        0
                                        • B Burrito

                                          @JonB

                                          The QT program and SSMS are running on the same machine, the SQLEXPRESS server app is running on a linux machine on the network. The query and the pivoted result is approx 240k instead of 2.5 mio. inital rows.

                                          I just realised that my "query without interation" still iterated over all QSqlRecord records - just didn´t do anything with it. Running just the empty while(query.next() loop148 secs. So I already "halfed" my query time by removing QSqlRecord, however, using the following code without QSqlRecord:

                                          while(query.next())
                                          {
                                               rawResult singleresult;
                                          
                                               singleresult.Date= query.value(0).toLongLong();
                                               singleresult.Data1= query.value(1).toDouble();
                                               singleresult.Data2 = query.value(2).toDouble();
                                               singleresult.Data3 = query.value(3).toDouble();
                                               singleresult.Data4 = query.value(4).toDouble();
                                               singleresult.Data5= query.value(5).toDouble();
                                               singleresult.Data6 = query.value(6).toDouble();
                                               singleresult.Data7 = query.value(7).toDouble();
                                               singleresult.Data8 = query.value(8).toDouble();
                                          
                                               rawresults.push_back(singleresult);
                                          }
                                          

                                          The time is 270 secs again. Just executing the query.exex(qry3) needs 23 secs just like SSMS.

                                          JonBJ Offline
                                          JonBJ Offline
                                          JonB
                                          wrote on last edited by JonB
                                          #20

                                          @Burrito
                                          Hmm. Where to guess....

                                          The driver used by SSMS is, I think, some kind of "SQL native query" driver, I think the one used by Qt may be different. You specify QODBC https://doc.qt.io/qt-5/sql-driver.html#qodbc to QSqlDatabase::addDatabase, right?

                                          https://doc.qt.io/qt-5/sql-driver.html#qodbc-for-open-database-connectivity-odbc

                                          Note: You should use the native driver, if it is available, instead of the ODBC driver. ODBC support can be used as a fallback for compliant databases if no native driver is available.

                                          Unlike for, say, MySQL you have no choice from Qt for MS SQL. There may be an overhead there.

                                          I need you test/confirm what happens when you do just:

                                          while(query.next())
                                          {
                                          }
                                          

                                          Is that, with no query.value() calls, your "needs 23 secs just like SSMS`.

                                          In which case, pulling the values out of the rows is costly. You are doing it via a non-native ODBC driver, SSMS is doing it via a native SQL client one. There are are 240k * 9 extract/converts to do. Comment out half the query.value() lines, does the time halve?

                                          You may as well also set https://doc.qt.io/qt-5/qsqlquery.html#setForwardOnly. From what you say that does not seem to be the issue, but put it in anyway.

                                          1 Reply Last reply
                                          2

                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Users
                                          • Groups
                                          • Search
                                          • Get Qt Extensions
                                          • Unsolved