SQLite vs. QFile Performance Issue
-
Hi all,
My application receives data from a remote server via a QTCPSocket. I'm implementing a logging function, which records this data to a file on the device's drive (flash-based, not a spinning platter). The logging function is enabled/disabled via a simple button in the QML UI.
I've implemented an SQLite-based process for recording the data, as I'm able to encrypt the data (which is a requirement) using the SQLITECIPHER database plugin, and the data is contained within a single file (another requirement). I don't have any need for typical 'database' functionality (i.e. I'm using a single table, no indexing, etc.); I'm essentially using SQLite as a flat file.
The issue I'm having with this setup is in insertion/write performance. The process works as intended, and I'm able to write the data and have the resultant SQLite file encrypted using the SQLITECIPHER plugin. However, the insertion/write performance makes the current solution unusable (I'll outline the performance issues below). As a point of comparison, I've also implemented a QFile process, which simply streams all the data to a .txt file with comma separation. I've included (simplified) code below which illustrates each of these two processes.
The SQLite process:
void saveData::setup() { db = QSqlDatabase::addDatabase("SQLITECIPHER", "data"); db.setDatabaseName("myData.sqlite"); db.setPassword("myPassword"); db.setConnectOptions("QSQLITE_USE_CIPHER=sqlcipher; SQLCIPHER_LEGACY=1"); if (!db.open()) { qDebug() << "Error opening db."; } else { query = new QSqlQuery(db); query->prepare(CREATE_TABLE_STRING); if (!query->exec()) { qDebug() << "Failed to create table."; } query->prepare(INSERT_DATA_STRING); } } void saveData::receiveData(QList<QList<QVariant>> data) { QSqlDatabase::database().transaction(); for (int i = 0; i < data.size(); i++) { for (int j = 0; j < data[i].size(); j++) { query->addBindValue(data[i].at(j)); } if (!query->exec()) { qDebug() << "Failed to insert data."; } } QSqlDatabase::database().commit(); }
A few points about the above code:
- db is defined in the class header file
- CREATE_TABLE_STRING is a standard 'CREATE TABLE myTableName' SQL query.
- INSERT_DATA_STRING is a standard 'INSERT INTO myTableName' SQL query, making use of positional bindings using the '?' character.
- Data is sent to saveData::receiveData() from another class via a signal/slot, using a Qt::QueuedConnection (the saveData class and the sending class are in different QThreads).
- I've implemented a 'linked list'-type QList setup in the sending class; I have a QList<QList<QList<QVariant>>>, the top level of which has 2 QLists. I fill one QList up with data, and when full, I send its QList<QList<QVariant>> for recording to the database, then start filling up the second QList. The intent here is to cache X number of records in memory before writing to the disk.
The QFile-based process:
void saveData::receiveData(QList<QList<QVariant>> data { QFile dataFile("data.txt"); if (!dataFile.open(QIODevice::WriteOnly | QIODevice::Append)) { qDebug() << "Error opening QFile."; return; } QTextStream dataStream(&dataFile); for (int i = 0; i < data.size(); i++) { for (int j = 0; j < data[i].size(); j++) { if (j == data[i].size() - 1) { dataStream << data[i].at(j).toString() << "\n"; } else { dataStream << data[i].at(j).toString() << ","; } } } }
Additional information:
- Each record has ~170 fields/columns, covering all data types (TEXT, INTEGER, REAL, etc.).
- I don't think my nested QList approach is optimal; any guidance on the best way to store X lists of variant data would be welcomed.
- 50 records are added to the nested QList per second. I've played with different cache sizes (i.e. the 'X' number of QLists referred to above), from 50 (i.e. writing data to file once per second), to 1,000 (writing to file every 20 seconds) and 2,000 (writing to file every 40 seconds).
The results:
- The QFile process records the data in real-time. I.e. when I stop recording data (via the QML button), and immediately check the size of the file (via ls -l data.txt in the Linux terminal), I can determine that data is no longer being added, as subsequent ls -l data.txt commands show the same file size. I run the test for ~5 minutes, which results in a file of roughly 50MB.
- The SQLite-based process successfully writes all the data to file (i.e. the creation of the database and table is successful, as is the binding of the QList<QList<QVariant>> values to the prepared query), but it takes a significant amount of time. When I stop recording data after ~5 minutes, and monitor the resultant myData.sqlite file (again using ls -l myData.sqlite in the Linux terminal), it takes upwards of 10 minutes for the file size to stop increasing, indicating the final data has been written. NOTE: I've implemented the exact same SQLite-based process using the default SQLite plugin (instead of the cipher-based plugin), without an encryption, as I wanted to ensure the encryption process wasn't the bottleneck. The resulting time to finish writing the data was the same, so I don't believe the issue is with the cipher plugin.
My questions:
- Are my expectations regarding the SQLite-based process flawed? My understanding is that SQLite is highly performant for this use case, which seems to be reflected in this article (I understand that this article is looking specifically at blobs, so maybe that's an important distinction).
- Am I approaching the SQLite code in the correct way? I.e. looping through each QList<QVariant> within a transaction? If so, or if not, how can this be improved?
- As mentioned earlier, my use of the nested QList structure doesn't seem optimal, and although the QFile-based process doesn't appear to be bottlenecked by it, I'd be open to changing this approach.
Any thoughts, clarifications or guidance would be greatly appreciated as always.
-
Performing the mass insert inside a transaction is definitely correct. Sqlite can process many thousand inserts a second but only a few transactions (i.e. commits requiring complete flushing to persistent storage). See https://www.sqlite.org/faq.html#q19
I notice that you are creating the transaction against the default database connection, not the database you gave the connectionName "data", and that you are not checking the return value from the transaction() or commit() calls (which are probably failing). This quite probably means you are performing 50000 transactions rather than one transaction with 50000 inserts on the "data" connection.
-
@ChrisW67 said in SQLite vs. QFile Performance Issue:
Performing the mass insert inside a transaction is definitely correct. Sqlite can process many thousand inserts a second but only a few transactions (i.e. commits requiring complete flushing to persistent storage). See https://www.sqlite.org/faq.html#q19
I notice that you are creating the transaction against the default database connection, not the database you gave the connectionName "data", and that you are not checking the return value from the transaction() or commit() calls (which are probably failing). This quite probably means you are performing 50000 transactions rather than one transaction with 50000 inserts on the "data" connection.
Ah of course, this is exactly the problem. Thanks @ChrisW67, I really appreciate your input.
For anyone stumbling across this post in the future, the correct transaction setup should be as follows:
if (!QSqlDatabase::database("data").transaction()) { qDebug() << "Error starting transaction."; } ... if (!QSqlDatabase::database("data").commit()) { qDebug() << "Error committing transaction."; }
I've verified the above code change, and am now seeing essentially the same real-time insertion performance with SQLite as I am with the QFile process.