Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. Mobile and Embedded
  4. Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv
QtWS25 Last Chance

Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv

Scheduled Pinned Locked Moved Unsolved Mobile and Embedded
23 Posts 6 Posters 4.9k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    jars121
    wrote on 27 May 2018, 11:18 last edited by
    #1

    My Qt application is built around a data acquisition device, which can capture up to ~1GB per hour of data. My current 'prototype' code saves the data to a QVector in memory, which works nicely until my device runs out of memory (around 1 hour with 1GB of RAM). Given that the data rate is relatively low, writing the data directly to a file on the device's storage medium (eMMC, SSD, SD, etc.) is the obvious next step.

    In my research, I've come across a number of potential options, and I'm hoping for some input/discussion regarding the most appropriate solution for this particular application. The data is currently captured and stored in QVector<QVector<double>> arrays, to represent the following data structure:

    [[V1d1,V1d2,V1d3,V1d4],[V2d1,V2d2],[V3d1,V3d2,V3d3,V3d4,V3d5]]
    

    where Vx is the QVector (within a QVector) number and dy is the data point within QVector Vx. This structure is pretty straight forward, but will need to change depending on the solution (e.g. this approach won't work for a .csv file for example).

    QDataStream looks like a good means for packaging and writing the data, but I'm unsure of which file type to use. I've used CSV extensively in the past, but think I should use something a little more sophisticated in this case. The data will eventually be sent to a MySQL DB, so I'm considering saving it to a DB directly on the device, but also think this might be overkill for this application. Are there options I haven't considered, such as XML or JSON?

    I've mentioned one consideration regarding the transmission of the saved data to a MySQL DB, with the second being fast read speed, as the data will also be presented in a range of QtCharts (once the data has been saved to file, not directly from memory).

    Any input, references or past experience would be greatly appreciated as always :)

    1 Reply Last reply
    0
    • M Offline
      M Offline
      mvuori
      wrote on 27 May 2018, 17:01 last edited by
      #2

      People seem to be moving away from XML to JSON and, actually, your data looks like the kind that JSON is very suitable for.

      1 Reply Last reply
      1
      • J Offline
        J Offline
        jars121
        wrote on 27 May 2018, 21:32 last edited by
        #3

        Thanks @mvuori, JSON certainly does look like a suitable option. I'll have a play with JSON for the time being and see how it goes.

        1 Reply Last reply
        0
        • S Offline
          S Offline
          SGaist
          Lifetime Qt Champion
          wrote on 27 May 2018, 21:38 last edited by
          #4

          Hi,

          Qt already supports databases, so depending on your complete system, you could use SQLite and then sync the table contents. If network is an option you could even send the data to remote service using REST and store them directly where it's needed.

          Interested in AI ? www.idiap.ch
          Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

          1 Reply Last reply
          2
          • J Offline
            J Offline
            jars121
            wrote on 27 May 2018, 22:26 last edited by jars121
            #5

            Thanks @SGaist, much appreciated as always. Network isn't an option (in the normal use case, there are exceptions), so data storage will need to be local. I'm giving the JSON approach some more thought, and am starting to think that an SQLite-based (or similar) approach might be more suitable.

            The number of QVectors is fixed, there are 64 of them. The number of data points within each QVector is dynamic; QVector1 might accumulate 100 points/second, QVector2 might accumulate 1 point/second and QVector3 might accumulate 1000 points/second. The issue I foresee with the JSON approach is appending each new data point to the respective key in the JSON file.

            For example, consider the following JSON format:

            {
               "vector1": {
                  "vectorName": "vector1Name",
                  "data1": "0.01",
                  "data2": "15.26",
                  "data3": "4.67"
               },
               "vector2": {
                  "vectorName": "vector2Name",
                  "data":["32.54","0.45","3.44"]
               }
            }
            

            This demonstrations (what I perceive to be 2 JSON options for storing the values). Adding a new key/value pair for each data point (vector1 approach) doesn't seem right, whereas an array of values within a value for the "data" key (vector2 approach) seems like it could work. The question I'm facing, the answer to which will determine whether I investigate an SQLite-based approach more seriously, is whether I can simply append to the "data" value array as simply as a QVector, QList, etc. I.e. dataArray.append(value).

            If so, does it require first reading the file into memory, or can the file be opened as WriteOnly and appended directly? This is quite important, as I can't afford (in both time and memory) to read a growing JSON file each time I need to append a new value.

            J 1 Reply Last reply 29 May 2018, 14:14
            0
            • S Offline
              S Offline
              SGaist
              Lifetime Qt Champion
              wrote on 28 May 2018, 20:50 last edited by
              #6

              One other possibility could be to use a NoSQL database.

              Interested in AI ? www.idiap.ch
              Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

              1 Reply Last reply
              0
              • J jars121
                27 May 2018, 22:26

                Thanks @SGaist, much appreciated as always. Network isn't an option (in the normal use case, there are exceptions), so data storage will need to be local. I'm giving the JSON approach some more thought, and am starting to think that an SQLite-based (or similar) approach might be more suitable.

                The number of QVectors is fixed, there are 64 of them. The number of data points within each QVector is dynamic; QVector1 might accumulate 100 points/second, QVector2 might accumulate 1 point/second and QVector3 might accumulate 1000 points/second. The issue I foresee with the JSON approach is appending each new data point to the respective key in the JSON file.

                For example, consider the following JSON format:

                {
                   "vector1": {
                      "vectorName": "vector1Name",
                      "data1": "0.01",
                      "data2": "15.26",
                      "data3": "4.67"
                   },
                   "vector2": {
                      "vectorName": "vector2Name",
                      "data":["32.54","0.45","3.44"]
                   }
                }
                

                This demonstrations (what I perceive to be 2 JSON options for storing the values). Adding a new key/value pair for each data point (vector1 approach) doesn't seem right, whereas an array of values within a value for the "data" key (vector2 approach) seems like it could work. The question I'm facing, the answer to which will determine whether I investigate an SQLite-based approach more seriously, is whether I can simply append to the "data" value array as simply as a QVector, QList, etc. I.e. dataArray.append(value).

                If so, does it require first reading the file into memory, or can the file be opened as WriteOnly and appended directly? This is quite important, as I can't afford (in both time and memory) to read a growing JSON file each time I need to append a new value.

                J Offline
                J Offline
                JKSH
                Moderators
                wrote on 29 May 2018, 14:14 last edited by
                #7

                @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                The number of QVectors is fixed, there are 64 of them. The number of data points within each QVector is dynamic; QVector1 might accumulate 100 points/second, QVector2 might accumulate 1 point/second and QVector3 might accumulate 1000 points/second. The issue I foresee with the JSON approach is appending each new data point to the respective key in the JSON file.

                You have indeed foreseen the critical issue: With data coming in so quickly and with different sampling rates, text storage (CSV, XML, JSON) is impractical. You should use some kind of binary storage instead.

                SQLite works fine, it's quite simple to use, and it comes bundled with Qt. This is the simplest way forward.

                If this is a long-term project though, I recommend exploring other data formats that are designed for storing time-series data like what you have. A common one in science and engineering is HDF5 which comes with an official C++ API. Note: This is heavier than SQLite, and does come with potential risks.

                1 Reply Last reply
                1
                • J Offline
                  J Offline
                  jars121
                  wrote on 29 May 2018, 22:20 last edited by
                  #8

                  Thank you very much @JKSH, you've introduced me to a number of new concepts to consider in your post. HDF5 looks like an interesting proposition, but the post you linked to certainly highlights the shortcomings and issues with that format.

                  I think I'll give the SQLite approach a go for now, and investigate other science/engineering-based formats down the line if need be.

                  1 Reply Last reply
                  0
                  • J Offline
                    J Offline
                    jars121
                    wrote on 5 Jun 2018, 11:08 last edited by
                    #9

                    It's been a week, and I've played around with the SQlite option. At first I was running a new query for each insertion of data, which was obviously quite slow as each query execution has considerable overhead (particularly in normal synchronous mode).

                    As the code currently stands, for each iteration of data acquisition (each of the 64 inputs is sampled once), the readings are collected and inserted into the database in a single transaction, with each input represented as a column in the database.

                    The issue I have, is even with synchronous mode turned off, journal mode turned off and a sizeable cache (PRAGRAM cache_size), I'm still unable to write the data to the database fast enough. I have a timer which measures the time between calls to the database write function, and it's currently showing times of up to 6 seconds (but usually around 1.5 seconds) in normal synchronous mode, and around 70ms in synchronous off mode. Bear in mind this test is only on ~10 inputs at a desired sample rate of 1kHz, so the write frequency seems really slow?

                    This has me thinking; could I perhaps revert to my initial method of saving the data in a QVector, and when the amount of memory used reaches a set point, dump the data into the database? I could see this happening in one of two ways:

                    1. Once the set amount of memory utilisation is reached, the logging stops and the user is informed that the logging capacity has been reached, and save the QVector contents in the SQLite database.
                    2. Once a set number of readings or amount of memory has been reached, save the data to the SQLite database, but continue saving data to memory, such that logging can continue.

                    I'm somewhat tempted to give the binary data file option a go as well.

                    If anyone has any input or advice I'm all ears :D

                    J 2 Replies Last reply 5 Jun 2018, 11:45
                    0
                    • J Offline
                      J Offline
                      jars121
                      wrote on 5 Jun 2018, 11:29 last edited by
                      #10

                      Just realised that I had an accidental nested loop in there which probably accounts for the perceived performance issues. I'm going to continue testing and see whether this SQLite is going to be fast enough; I'd still like to hear from more seasoned database users in the meantime though!

                      J 2 Replies Last reply 5 Jun 2018, 12:36
                      0
                      • J jars121
                        5 Jun 2018, 11:08

                        It's been a week, and I've played around with the SQlite option. At first I was running a new query for each insertion of data, which was obviously quite slow as each query execution has considerable overhead (particularly in normal synchronous mode).

                        As the code currently stands, for each iteration of data acquisition (each of the 64 inputs is sampled once), the readings are collected and inserted into the database in a single transaction, with each input represented as a column in the database.

                        The issue I have, is even with synchronous mode turned off, journal mode turned off and a sizeable cache (PRAGRAM cache_size), I'm still unable to write the data to the database fast enough. I have a timer which measures the time between calls to the database write function, and it's currently showing times of up to 6 seconds (but usually around 1.5 seconds) in normal synchronous mode, and around 70ms in synchronous off mode. Bear in mind this test is only on ~10 inputs at a desired sample rate of 1kHz, so the write frequency seems really slow?

                        This has me thinking; could I perhaps revert to my initial method of saving the data in a QVector, and when the amount of memory used reaches a set point, dump the data into the database? I could see this happening in one of two ways:

                        1. Once the set amount of memory utilisation is reached, the logging stops and the user is informed that the logging capacity has been reached, and save the QVector contents in the SQLite database.
                        2. Once a set number of readings or amount of memory has been reached, save the data to the SQLite database, but continue saving data to memory, such that logging can continue.

                        I'm somewhat tempted to give the binary data file option a go as well.

                        If anyone has any input or advice I'm all ears :D

                        J Offline
                        J Offline
                        JKSH
                        Moderators
                        wrote on 5 Jun 2018, 11:45 last edited by
                        #11
                        This post is deleted!
                        1 Reply Last reply
                        0
                        • J jars121
                          5 Jun 2018, 11:29

                          Just realised that I had an accidental nested loop in there which probably accounts for the perceived performance issues. I'm going to continue testing and see whether this SQLite is going to be fast enough; I'd still like to hear from more seasoned database users in the meantime though!

                          J Online
                          J Online
                          JonB
                          wrote on 5 Jun 2018, 12:36 last edited by
                          #12

                          @jars121
                          If you're suggesting that every database write is taking 6 seconds, or even 1.5 seconds, there is indeed something very fishy going on! :)

                          1 Reply Last reply
                          0
                          • J jars121
                            5 Jun 2018, 11:08

                            It's been a week, and I've played around with the SQlite option. At first I was running a new query for each insertion of data, which was obviously quite slow as each query execution has considerable overhead (particularly in normal synchronous mode).

                            As the code currently stands, for each iteration of data acquisition (each of the 64 inputs is sampled once), the readings are collected and inserted into the database in a single transaction, with each input represented as a column in the database.

                            The issue I have, is even with synchronous mode turned off, journal mode turned off and a sizeable cache (PRAGRAM cache_size), I'm still unable to write the data to the database fast enough. I have a timer which measures the time between calls to the database write function, and it's currently showing times of up to 6 seconds (but usually around 1.5 seconds) in normal synchronous mode, and around 70ms in synchronous off mode. Bear in mind this test is only on ~10 inputs at a desired sample rate of 1kHz, so the write frequency seems really slow?

                            This has me thinking; could I perhaps revert to my initial method of saving the data in a QVector, and when the amount of memory used reaches a set point, dump the data into the database? I could see this happening in one of two ways:

                            1. Once the set amount of memory utilisation is reached, the logging stops and the user is informed that the logging capacity has been reached, and save the QVector contents in the SQLite database.
                            2. Once a set number of readings or amount of memory has been reached, save the data to the SQLite database, but continue saving data to memory, such that logging can continue.

                            I'm somewhat tempted to give the binary data file option a go as well.

                            If anyone has any input or advice I'm all ears :D

                            J Offline
                            J Offline
                            JKSH
                            Moderators
                            wrote on 5 Jun 2018, 14:26 last edited by
                            #13

                            @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                            As the code currently stands, for each iteration of data acquisition (each of the 64 inputs is sampled once), the readings are collected and inserted into the database in a single transaction, with each input represented as a column in the database.

                            This still requires your code to do all of this within 1 ms:

                            1. Transfer 1 sample per channel from your device into RAM
                            2. Begin a transaction
                            3. Insert 64 items
                            4. Commit the transaction

                            This is not possible, because each of the steps above is quite expensive and takes a long time. I haven't measured, but I have a feeling step #3 is actually the cheapest!

                            What data acquisition hardware are you using? If its API is well-designed, it should have some kind of "buffered acquisition" or "continuous sampling" mode. Instead of reading one sample per channel per iteration, you should allow the data to accumulate in the acquisition device. Then, read >=100ms worth of data each iteration. For example, each iteration could look like this:

                            1. Transfer 250 samples per channel from your device into RAM (using only 1 function call, not 250 or 16000 separate function calls!)
                            2. Begin a transaction
                            3. Insert 64*250 = 16000 items
                            4. Commit the transaction

                            You should find this far, far more efficient.

                            P.S. You might have received an email notification about a post I made earlier about Transactions. Please ignore that -- I didn't read your post properly!

                            J 1 Reply Last reply 5 Jun 2018, 22:40
                            1
                            • J jars121
                              5 Jun 2018, 11:29

                              Just realised that I had an accidental nested loop in there which probably accounts for the perceived performance issues. I'm going to continue testing and see whether this SQLite is going to be fast enough; I'd still like to hear from more seasoned database users in the meantime though!

                              J Online
                              J Online
                              JonB
                              wrote on 5 Jun 2018, 14:37 last edited by
                              #14

                              @jars121
                              In addition to @JKSH, even though he says

                              I have a feeling step #3 is actually the cheapest!

                              depending on your database engine and your connection, 64 distinct SQL statements sent separately may incur an overhead. If you can, you could first build a single 64-INSERT-statements-string and send as a single exec() instead of each one on its own. Or even, maybe, send a single string with, say, comma-separated values to a stored procedure and have that do all the work to split & execute. (As I say, don't know how much difference that would make to SQLite.)

                              1 Reply Last reply
                              0
                              • J JKSH
                                5 Jun 2018, 14:26

                                @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                As the code currently stands, for each iteration of data acquisition (each of the 64 inputs is sampled once), the readings are collected and inserted into the database in a single transaction, with each input represented as a column in the database.

                                This still requires your code to do all of this within 1 ms:

                                1. Transfer 1 sample per channel from your device into RAM
                                2. Begin a transaction
                                3. Insert 64 items
                                4. Commit the transaction

                                This is not possible, because each of the steps above is quite expensive and takes a long time. I haven't measured, but I have a feeling step #3 is actually the cheapest!

                                What data acquisition hardware are you using? If its API is well-designed, it should have some kind of "buffered acquisition" or "continuous sampling" mode. Instead of reading one sample per channel per iteration, you should allow the data to accumulate in the acquisition device. Then, read >=100ms worth of data each iteration. For example, each iteration could look like this:

                                1. Transfer 250 samples per channel from your device into RAM (using only 1 function call, not 250 or 16000 separate function calls!)
                                2. Begin a transaction
                                3. Insert 64*250 = 16000 items
                                4. Commit the transaction

                                You should find this far, far more efficient.

                                P.S. You might have received an email notification about a post I made earlier about Transactions. Please ignore that -- I didn't read your post properly!

                                J Offline
                                J Offline
                                jars121
                                wrote on 5 Jun 2018, 22:40 last edited by
                                #15

                                @JKSH said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                As the code currently stands, for each iteration of data acquisition (each of the 64 inputs is sampled once), the readings are collected and inserted into the database in a single transaction, with each input represented as a column in the database.

                                This still requires your code to do all of this within 1 ms:

                                1. Transfer 1 sample per channel from your device into RAM
                                2. Begin a transaction
                                3. Insert 64 items
                                4. Commit the transaction

                                This is not possible, because each of the steps above is quite expensive and takes a long time. I haven't measured, but I have a feeling step #3 is actually the cheapest!

                                What data acquisition hardware are you using? If its API is well-designed, it should have some kind of "buffered acquisition" or "continuous sampling" mode. Instead of reading one sample per channel per iteration, you should allow the data to accumulate in the acquisition device. Then, read >=100ms worth of data each iteration. For example, each iteration could look like this:

                                1. Transfer 250 samples per channel from your device into RAM (using only 1 function call, not 250 or 16000 separate function calls!)
                                2. Begin a transaction
                                3. Insert 64*250 = 16000 items
                                4. Commit the transaction

                                You should find this far, far more efficient.

                                P.S. You might have received an email notification about a post I made earlier about Transactions. Please ignore that -- I didn't read your post properly!

                                Thanks @JKSH. That approach might work for some of the readings which are coming through an IC, but many of the samples are taken directly by the device through sysfs, in which case there is no external buffer.

                                This has certainly given me a direction to try though, as I've been pursuing database write optimisation when it's the acquisition and storage function which really needs some attention.

                                @JonB said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                @jars121
                                In addition to @JKSH, even though he says

                                I have a feeling step #3 is actually the cheapest!

                                depending on your database engine and your connection, 64 distinct SQL statements sent separately may incur an overhead. If you can, you could first build a single 64-INSERT-statements-string and send as a single exec() instead of each one on its own. Or even, maybe, send a single string with, say, comma-separated values to a stored procedure and have that do all the work to split & execute. (As I say, don't know how much difference that would make to SQLite.)

                                Thanks @JonB, that's the exact method I've now got :) My original approach was a single insert for each piece of data, but I've now built a comprehensive QString, and insert each piece of data into the QString with .arg(). It appears to be about as efficient I can make the insertion process at this point.

                                J 1 Reply Last reply 6 Jun 2018, 00:23
                                0
                                • J jars121
                                  5 Jun 2018, 22:40

                                  @JKSH said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                  @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                  As the code currently stands, for each iteration of data acquisition (each of the 64 inputs is sampled once), the readings are collected and inserted into the database in a single transaction, with each input represented as a column in the database.

                                  This still requires your code to do all of this within 1 ms:

                                  1. Transfer 1 sample per channel from your device into RAM
                                  2. Begin a transaction
                                  3. Insert 64 items
                                  4. Commit the transaction

                                  This is not possible, because each of the steps above is quite expensive and takes a long time. I haven't measured, but I have a feeling step #3 is actually the cheapest!

                                  What data acquisition hardware are you using? If its API is well-designed, it should have some kind of "buffered acquisition" or "continuous sampling" mode. Instead of reading one sample per channel per iteration, you should allow the data to accumulate in the acquisition device. Then, read >=100ms worth of data each iteration. For example, each iteration could look like this:

                                  1. Transfer 250 samples per channel from your device into RAM (using only 1 function call, not 250 or 16000 separate function calls!)
                                  2. Begin a transaction
                                  3. Insert 64*250 = 16000 items
                                  4. Commit the transaction

                                  You should find this far, far more efficient.

                                  P.S. You might have received an email notification about a post I made earlier about Transactions. Please ignore that -- I didn't read your post properly!

                                  Thanks @JKSH. That approach might work for some of the readings which are coming through an IC, but many of the samples are taken directly by the device through sysfs, in which case there is no external buffer.

                                  This has certainly given me a direction to try though, as I've been pursuing database write optimisation when it's the acquisition and storage function which really needs some attention.

                                  @JonB said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                  @jars121
                                  In addition to @JKSH, even though he says

                                  I have a feeling step #3 is actually the cheapest!

                                  depending on your database engine and your connection, 64 distinct SQL statements sent separately may incur an overhead. If you can, you could first build a single 64-INSERT-statements-string and send as a single exec() instead of each one on its own. Or even, maybe, send a single string with, say, comma-separated values to a stored procedure and have that do all the work to split & execute. (As I say, don't know how much difference that would make to SQLite.)

                                  Thanks @JonB, that's the exact method I've now got :) My original approach was a single insert for each piece of data, but I've now built a comprehensive QString, and insert each piece of data into the QString with .arg(). It appears to be about as efficient I can make the insertion process at this point.

                                  J Offline
                                  J Offline
                                  JKSH
                                  Moderators
                                  wrote on 6 Jun 2018, 00:23 last edited by
                                  #16

                                  @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                  many of the samples are taken directly by the device through sysfs, in which case there is no external buffer.

                                  A CPU isn't ideal for acquiring data at that speed. Earlier, you said that "the data rate is relatively low". However, 64000 samples per second is considered fast for a CPU thread when there is no hardware-buffered acquisition available.

                                  Some questions:

                                  1. Back when you only acquired into QVectors without writing to disk, what was your CPU load?
                                  2. Back when you only acquired into QVectors without writing to disk, how much jitter was there? In other words, how close do you actually get to sampling exactly every 1ms, as opposed to 1.2ms sometimes and 0.8ms other times?
                                    • What level of jitter is acceptable in your data?
                                  3. What kinds of signals are you acquiring? (Temperature? Raw voltages? etc.)
                                  4. What is your data acquisition device? (Raspberry Pi?)
                                  5. What's the purpose of this device? (Hobby? Research? Commercial?)

                                  @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                  This has me thinking; could I perhaps revert to my initial method of saving the data in a QVector, and when the amount of memory used reaches a set point, dump the data into the database? I could see this happening in one of two ways:

                                  …

                                  1. Once a set number of readings or amount of memory has been reached, save the data to the SQLite database, but continue saving data to memory, such that logging can continue.

                                  Since you can't do buffered acquisition, then No. #2 is the way to go. However, have a thread dedicated to data acquisition and nothing else. Then, every 100ms (or 250 ms or 500 ms), your main thread can write the data disk. Acquiring and writing should be done in different threads because it takes many milliseconds to write to disk.

                                  J 1 Reply Last reply 6 Jun 2018, 00:48
                                  2
                                  • J JKSH
                                    6 Jun 2018, 00:23

                                    @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                    many of the samples are taken directly by the device through sysfs, in which case there is no external buffer.

                                    A CPU isn't ideal for acquiring data at that speed. Earlier, you said that "the data rate is relatively low". However, 64000 samples per second is considered fast for a CPU thread when there is no hardware-buffered acquisition available.

                                    Some questions:

                                    1. Back when you only acquired into QVectors without writing to disk, what was your CPU load?
                                    2. Back when you only acquired into QVectors without writing to disk, how much jitter was there? In other words, how close do you actually get to sampling exactly every 1ms, as opposed to 1.2ms sometimes and 0.8ms other times?
                                      • What level of jitter is acceptable in your data?
                                    3. What kinds of signals are you acquiring? (Temperature? Raw voltages? etc.)
                                    4. What is your data acquisition device? (Raspberry Pi?)
                                    5. What's the purpose of this device? (Hobby? Research? Commercial?)

                                    @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                    This has me thinking; could I perhaps revert to my initial method of saving the data in a QVector, and when the amount of memory used reaches a set point, dump the data into the database? I could see this happening in one of two ways:

                                    …

                                    1. Once a set number of readings or amount of memory has been reached, save the data to the SQLite database, but continue saving data to memory, such that logging can continue.

                                    Since you can't do buffered acquisition, then No. #2 is the way to go. However, have a thread dedicated to data acquisition and nothing else. Then, every 100ms (or 250 ms or 500 ms), your main thread can write the data disk. Acquiring and writing should be done in different threads because it takes many milliseconds to write to disk.

                                    J Offline
                                    J Offline
                                    jars121
                                    wrote on 6 Jun 2018, 00:48 last edited by jars121 6 Jun 2018, 00:50
                                    #17

                                    @JKSH said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                    @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                    many of the samples are taken directly by the device through sysfs, in which case there is no external buffer.

                                    A CPU isn't ideal for acquiring data at that speed. Earlier, you said that "the data rate is relatively low". However, 64000 samples per second is considered fast for a CPU thread when there is no hardware-buffered acquisition available.

                                    Some questions:

                                    1. Back when you only acquired into QVectors without writing to disk, what was your CPU load?
                                    2. Back when you only acquired into QVectors without writing to disk, how much jitter was there? In other words, how close do you actually get to sampling exactly every 1ms, as opposed to 1.2ms sometimes and 0.8ms other times?
                                      • What level of jitter is acceptable in your data?
                                    3. What kinds of signals are you acquiring? (Temperature? Raw voltages? etc.)
                                    4. What is your data acquisition device? (Raspberry Pi?)
                                    5. What's the purpose of this device? (Hobby? Research? Commercial?)

                                    In answer to your 5 questions above:

                                    1. I'm not sure, I plan on re-instating the QVector approach to measure CPU load today.
                                    2. As per answer 1 above, I'll review the frequency accuracy today. In terms of target jitter, I don't have a hard requirement at this stage, as I'm still in the prototyping phase. When it comes to testing, I might find that 2ms resolution is adequate, or maybe even 4ms. I'm targeting 1ms for now as the most extreme case.
                                    3. I'm acquiring all sorts. I have 32 analog channels coming in via SPI ADCs (temperature, voltage, pressure, etc.), and 32 digital channels interfacing directly with the embedded device (GPIO inputs via sysfs and/or memory mapping), which can either be simple ON/OFF measurements (state of a relay for example), or pulse frequency/width measurements (PWM signals for example).
                                    4. At the moment it's a Raspberry Pi 2, but this will change in the very near future to a more commercial embedded platform (with heavily modified OS, PCB design, etc.).
                                    5. This will eventually become a commercial product, using the commercial version of Qt.

                                    @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                    This has me thinking; could I perhaps revert to my initial method of saving the data in a QVector, and when the amount of memory used reaches a set point, dump the data into the database? I could see this happening in one of two ways:

                                    …

                                    1. Once a set number of readings or amount of memory has been reached, save the data to the SQLite database, but continue saving data to memory, such that logging can continue.

                                    Since you can't do buffered acquisition, then No. #2 is the way to go. However, have a thread dedicated to data acquisition and nothing else. Then, every 100ms (or 250 ms or 500 ms), your main thread can write the data disk. Acquiring and writing should be done in different threads because it takes many milliseconds to write to disk.

                                    I'm using threading quite extensively throughout the application, and am currently running the SQLite function on a separate thread. Similarly, the data acquisition functions are run on separate threads as well. Each of the 32 digital inputs has its own thread, as the pulse frequency and width measurements rely on sysfs interrupts, which are a glorified poll of the gpio's corresponding sysfs file status (1 or 0), so they have to be on separate threads unfortunately. All 32 analog signals are on a single thread.

                                    J J.HilkJ 2 Replies Last reply 6 Jun 2018, 05:24
                                    0
                                    • J Offline
                                      J Offline
                                      jars121
                                      wrote on 6 Jun 2018, 01:19 last edited by
                                      #18
                                      This post is deleted!
                                      1 Reply Last reply
                                      0
                                      • J jars121
                                        6 Jun 2018, 00:48

                                        @JKSH said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                        @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                        many of the samples are taken directly by the device through sysfs, in which case there is no external buffer.

                                        A CPU isn't ideal for acquiring data at that speed. Earlier, you said that "the data rate is relatively low". However, 64000 samples per second is considered fast for a CPU thread when there is no hardware-buffered acquisition available.

                                        Some questions:

                                        1. Back when you only acquired into QVectors without writing to disk, what was your CPU load?
                                        2. Back when you only acquired into QVectors without writing to disk, how much jitter was there? In other words, how close do you actually get to sampling exactly every 1ms, as opposed to 1.2ms sometimes and 0.8ms other times?
                                          • What level of jitter is acceptable in your data?
                                        3. What kinds of signals are you acquiring? (Temperature? Raw voltages? etc.)
                                        4. What is your data acquisition device? (Raspberry Pi?)
                                        5. What's the purpose of this device? (Hobby? Research? Commercial?)

                                        In answer to your 5 questions above:

                                        1. I'm not sure, I plan on re-instating the QVector approach to measure CPU load today.
                                        2. As per answer 1 above, I'll review the frequency accuracy today. In terms of target jitter, I don't have a hard requirement at this stage, as I'm still in the prototyping phase. When it comes to testing, I might find that 2ms resolution is adequate, or maybe even 4ms. I'm targeting 1ms for now as the most extreme case.
                                        3. I'm acquiring all sorts. I have 32 analog channels coming in via SPI ADCs (temperature, voltage, pressure, etc.), and 32 digital channels interfacing directly with the embedded device (GPIO inputs via sysfs and/or memory mapping), which can either be simple ON/OFF measurements (state of a relay for example), or pulse frequency/width measurements (PWM signals for example).
                                        4. At the moment it's a Raspberry Pi 2, but this will change in the very near future to a more commercial embedded platform (with heavily modified OS, PCB design, etc.).
                                        5. This will eventually become a commercial product, using the commercial version of Qt.

                                        @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                        This has me thinking; could I perhaps revert to my initial method of saving the data in a QVector, and when the amount of memory used reaches a set point, dump the data into the database? I could see this happening in one of two ways:

                                        …

                                        1. Once a set number of readings or amount of memory has been reached, save the data to the SQLite database, but continue saving data to memory, such that logging can continue.

                                        Since you can't do buffered acquisition, then No. #2 is the way to go. However, have a thread dedicated to data acquisition and nothing else. Then, every 100ms (or 250 ms or 500 ms), your main thread can write the data disk. Acquiring and writing should be done in different threads because it takes many milliseconds to write to disk.

                                        I'm using threading quite extensively throughout the application, and am currently running the SQLite function on a separate thread. Similarly, the data acquisition functions are run on separate threads as well. Each of the 32 digital inputs has its own thread, as the pulse frequency and width measurements rely on sysfs interrupts, which are a glorified poll of the gpio's corresponding sysfs file status (1 or 0), so they have to be on separate threads unfortunately. All 32 analog signals are on a single thread.

                                        J Offline
                                        J Offline
                                        JKSH
                                        Moderators
                                        wrote on 6 Jun 2018, 05:24 last edited by JKSH 6 Jun 2018, 05:45
                                        #19

                                        @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                        1. I'm not sure, I plan on re-instating the QVector approach to measure CPU load today.
                                        2. As per answer 1 above, I'll review the frequency accuracy today.

                                        Yes, do your benchmarks first. The data will help you see what your next steps should be.

                                        1. I'm acquiring all sorts. I have 32 analog channels coming in via SPI ADCs (temperature, voltage, pressure, etc.), and 32 digital channels interfacing directly with the embedded device (GPIO inputs via sysfs and/or memory mapping), which can either be simple ON/OFF measurements (state of a relay for example), or pulse frequency/width measurements (PWM signals for example).

                                        What pulse widths and frequencies are you expecting to see? This will determine your sampling rate for the digital inputs.

                                        1. At the moment it's a Raspberry Pi 2, but this will change in the very near future to a more commercial embedded platform (with heavily modified OS, PCB design, etc.).
                                        2. This will eventually become a commercial product, using the commercial version of Qt.

                                        Customizing the OS and PCB could cost you a lot of time in development, testing, troubleshooting, and refining. Have you considered off-the-shelf components?

                                        If you still want to take the custom PCB route for a commercial product, remember to include protection circuitry. End-users are notorious for plugging things in the wrong way (for example, switching the + and - terminals, or applying an input voltage that's too high)

                                        J 1 Reply Last reply 6 Jun 2018, 07:02
                                        0
                                        • J jars121
                                          6 Jun 2018, 00:48

                                          @JKSH said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                          @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                          many of the samples are taken directly by the device through sysfs, in which case there is no external buffer.

                                          A CPU isn't ideal for acquiring data at that speed. Earlier, you said that "the data rate is relatively low". However, 64000 samples per second is considered fast for a CPU thread when there is no hardware-buffered acquisition available.

                                          Some questions:

                                          1. Back when you only acquired into QVectors without writing to disk, what was your CPU load?
                                          2. Back when you only acquired into QVectors without writing to disk, how much jitter was there? In other words, how close do you actually get to sampling exactly every 1ms, as opposed to 1.2ms sometimes and 0.8ms other times?
                                            • What level of jitter is acceptable in your data?
                                          3. What kinds of signals are you acquiring? (Temperature? Raw voltages? etc.)
                                          4. What is your data acquisition device? (Raspberry Pi?)
                                          5. What's the purpose of this device? (Hobby? Research? Commercial?)

                                          In answer to your 5 questions above:

                                          1. I'm not sure, I plan on re-instating the QVector approach to measure CPU load today.
                                          2. As per answer 1 above, I'll review the frequency accuracy today. In terms of target jitter, I don't have a hard requirement at this stage, as I'm still in the prototyping phase. When it comes to testing, I might find that 2ms resolution is adequate, or maybe even 4ms. I'm targeting 1ms for now as the most extreme case.
                                          3. I'm acquiring all sorts. I have 32 analog channels coming in via SPI ADCs (temperature, voltage, pressure, etc.), and 32 digital channels interfacing directly with the embedded device (GPIO inputs via sysfs and/or memory mapping), which can either be simple ON/OFF measurements (state of a relay for example), or pulse frequency/width measurements (PWM signals for example).
                                          4. At the moment it's a Raspberry Pi 2, but this will change in the very near future to a more commercial embedded platform (with heavily modified OS, PCB design, etc.).
                                          5. This will eventually become a commercial product, using the commercial version of Qt.

                                          @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                          This has me thinking; could I perhaps revert to my initial method of saving the data in a QVector, and when the amount of memory used reaches a set point, dump the data into the database? I could see this happening in one of two ways:

                                          …

                                          1. Once a set number of readings or amount of memory has been reached, save the data to the SQLite database, but continue saving data to memory, such that logging can continue.

                                          Since you can't do buffered acquisition, then No. #2 is the way to go. However, have a thread dedicated to data acquisition and nothing else. Then, every 100ms (or 250 ms or 500 ms), your main thread can write the data disk. Acquiring and writing should be done in different threads because it takes many milliseconds to write to disk.

                                          I'm using threading quite extensively throughout the application, and am currently running the SQLite function on a separate thread. Similarly, the data acquisition functions are run on separate threads as well. Each of the 32 digital inputs has its own thread, as the pulse frequency and width measurements rely on sysfs interrupts, which are a glorified poll of the gpio's corresponding sysfs file status (1 or 0), so they have to be on separate threads unfortunately. All 32 analog signals are on a single thread.

                                          J.HilkJ Offline
                                          J.HilkJ Offline
                                          J.Hilk
                                          Moderators
                                          wrote on 6 Jun 2018, 06:23 last edited by
                                          #20

                                          @jars121 said in Data acquisition storage: QDataStream vs. ofstream vs. DB vs. csv:

                                          I'm using threading quite extensively throughout the application, and am currently running the SQLite function on a separate thread. Similarly, the data acquisition functions are run on separate threads as well. Each of the 32 digital inputs has its own thread, as the pulse frequency and width measurements rely on sysfs interrupts, which are a glorified poll of the gpio's corresponding sysfs file status (1 or 0), so they have to be on separate threads unfortunately. All 32 analog signals are on a single thread.

                                          I would highly recomment that you reconsider this approach.

                                          A pi2 has a decent cpu, IIRC 900MHz quad-core, but ~40 threads alone from your program with a precision timing resolution of around 1ms, will be hard. The OS will struggle heavily with juggling the threads back and forth between the cores to be processed. You should easily get delays in the ms-area before the first thread is sheduled to be processed the next time.


                                          Be aware of the Qt Code of Conduct, when posting : https://forum.qt.io/topic/113070/qt-code-of-conduct


                                          Q: What's that?
                                          A: It's blue light.
                                          Q: What does it do?
                                          A: It turns blue.

                                          1 Reply Last reply
                                          3

                                          3/23

                                          27 May 2018, 21:32

                                          topic:navigator.unread, 20
                                          • Login

                                          • Login or register to search.
                                          3 out of 23
                                          • First post
                                            3/23
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Users
                                          • Groups
                                          • Search
                                          • Get Qt Extensions
                                          • Unsolved