How to skip chunks using QDataStream?

Crag_Hack

[Forked from https://forum.qt.io/topic/117372/best-way-to-compress-data-to-append-to-a-file/ --JKSH]

Thanks guys I have a followup question - is there any way to skip a QByteArray chunk when reading and parsing the QFile containing the compressed QByteArrays with a QDataStream? I know how many chunks to skip and only need to read in one chunk at a time into memory.

JKSH

https://doc.qt.io/qt-5/qdatastream.html#skipRawData

First, read 4 bytes of the file as int32_t. This value is the number of bytes occupied by the next byte array (and therefore it tells you how many bytes to skip). Repeat as many times as necessary.

JKSH

https://doc.qt.io/qt-5/qdatastream.html#skipRawData

First, read 4 bytes of the file as int32_t. This value is the number of bytes occupied by the next byte array (and therefore it tells you how many bytes to skip). Repeat as many times as necessary.

Crag_Hack

How do I read 4 bytes as int32_t? QDataStream readRawData?

JKSH

@Crag_Hack said in How to skip chunks using QDataStream?:

How do I read 4 bytes as int32_t?

int32_t n;
stream >> n;

QDataStream readRawData?

readRawData() won't convert the data for you.

Crag_Hack

Well that was easy :) Thx!

Crag_Hack

Is this the only way to skip data when reading from a QFile using a QDataStream? Each compressed QByteArray chunk is preceded by several QStrings. I'd like to skip those as well until arriving at the desired QStrings & QByteArray chunk grouped together.

JKSH

@Crag_Hack said in How to skip chunks using QDataStream?:

Is this the only way to skip data when reading from a QFile using a QDataStream?

Yes.

Each compressed QByteArray chunk is preceded by several QStrings. I'd like to skip those as well until arriving at the desired QStrings & QByteArray chunk grouped together.

Then you must skip the QStrings too, using the same technique.

However, it sounds like your requirements are:

A file to store heterogenous binary and text data
The ability to append new data to the file
The ability to read out specific data chunks on demand (non-sequential reading)

Have you considered using a different file format? For example, an SQLite file can let you specify exactly which chunks to extract without manually skipping.

J.Hilk

@Crag_Hack said in How to skip chunks using QDataStream?:

Is this the only way to skip data when reading from a QFile using a QDataStream?

No, QDataStream has skipRawData function,

but, that requires you to actually know, how many bytes you have to skip.

In case of a QString, you would have to analyze the QString header and at that point, you can simply stream it out into a local (and soon to be discarded) variable

SGaist

@JKSH said in How to skip chunks using QDataStream?:

@Crag_Hack said in How to skip chunks using QDataStream?:

Is this the only way to skip data when reading from a QFile using a QDataStream?

Yes.

Each compressed QByteArray chunk is preceded by several QStrings. I'd like to skip those as well until arriving at the desired QStrings & QByteArray chunk grouped together.

Then you must skip the QStrings too, using the same technique.

However, it sounds like your requirements are:

A file to store heterogenous binary and text data

The ability to append new data to the file

The ability to read out specific data chunks on demand (non-sequential reading)

Have you considered using a different file format? For example, an SQLite file can let you specify exactly which chunks to extract without manually skipping.

Or maybe something like HDF5 which allows you to store various type of data in a structured way.

Crag_Hack

Thanks guys.
@JKSH Can I compress the data chunks in an SQLite file? If so can you point me in a direction to find how to do what you proposed, perhaps a tutorial somewhere? I am already familiar with SQL and read an SQLite tutorial today.

Bonnie

@Crag_Hack
You can save compressed QByteArray as BLOB type

mrjj

@Crag_Hack
Hi
There is
https://wiki.qt.io/How_to_Store_and_Retrieve_Image_on_SQLite
that stores an image. It could also be any binary blob.

Crag_Hack

This post is deleted!

Crag_Hack

Will the performance of an SQLite file be comparable to using the QDataStream QByteArray data chunk skip method? Does the SQLite setup do sequential processing behind the scenes?

With SQLite indexing will I see better performance? Do I need to recreate the index every time I update the file? Can I store the index in the file or do I have create it on the fly every time?

The size of the file storing the data in either method will probably be comparable right?

Anything else that might be relevant?

mrjj

Hi

Will the performance of an SQLite file be comparable to using the QDataStream QByteArray data chunk skip method?

Depending on how you actually need to access the data, it should be as fast or even faster than datastream if we talking GB size file.

But that depends on that you can select with sql statement the data you want and then read the subset compared having to open this
GB file and do lots of skipping.

< Does the SQLite setup do sequential processing behind the scenes?
Im not sure what you mean here ?
what type of sequential processing ?

With SQLite indexing will I see better performance?

Only if you can create index on a column that helps you look up a row faster. For any blobs, i don't think it will help at all.

Do I need to recreate the index every time I update the file? Can I store the index in the file or do I have to create it on the fly every time?

You basically just tell it to create it and then its automatically handled.

The size of the file storing the data in either method will probably be comparable right?

Yes the overhead of the DB is not that much.

Anything else that might be relevant?

Use transactions when you update data. It will save your file
if the app crash or power cut while writing.
If one of the benefits of using db system versus file directly.
Make sure you check the path of the db file when opening the db. if you point to nonexisting file, then it will create a new one which can be mighty confusing.

This tool is really helpful for managing/testing the db
https://sqlitebrowser.org/

JonB

@Crag_Hack said in How to skip chunks using QDataStream?:

Does the SQLite setup do sequential processing behind the scenes?

No, with a database there will be no "sequential processing", it will be "random access" instead, meaning it will "jump to" the data you want directly, without reading through other data.

You could create a table consisting of two columns: an "id" column (an integer to identify each row uniquely) and a "blob" column to hold a piece of associated data of any size. SQLite will handle the "id" column for you if you make it an "auto-increment", it will just assign numbers 1, 2, 3... for you. You then have to know, somehow, that what you want is "row number #2", and you can ask it to give you that row's "blob" column data directly. It will be able to hand you back that value without having to read through any other rows/columns.

Crag_Hack

Thanks guys.

For this data set I will be first looking up records based on a QString with a name identifying a subset of the data, then further with a qint64 representing a QDateTime identifying a further subset of that data. Sounds to me like the aforementioned rules still apply.

Also I might need to delete some of the records in the beginning of the database if it gets too large. Will SQLite reconstruct the autoincrement id column with up to date values?

And I need to iterate over the entire database pulling out names and qint64 datetime values but I guess this won't complicate anything will it?

Anything else else that might be relevant? :)

JKSH

@Crag_Hack said in How to skip chunks using QDataStream?:

I will be first looking up records based on a QString with a name identifying a subset of the data, then further with a qint64 representing a QDateTime identifying a further subset of that data....

If the name and timestamp are guaranteed to be unique, then you can use those as the "key". Your SQL query code will look like this:

SELECT compressed_bytes FROM main_table WHERE name='XYZ' AND timestamp=12345678;

Also I might need to delete some of the records in the beginning of the database if it gets too large. Will SQLite reconstruct the autoincrement id column with up to date values?

No, SQLite won't auto-update the IDs when you delete some data.

What's the purpose of updating the IDs? Why not just keep the original values?

And I need to iterate over the entire database pulling out names and qint64 datetime values but I guess this won't complicate anything will it?

You don't "iterate over the database". You ask the database to give you all the data that you're interested in, using only a single query.

After that, you can iterate over the returned dataset :)

Crag_Hack

Awesome JKSH. Time to get to work.

So after some more reading looks like indexing is only for searching without an AUTOINCREMENT setup or default rowid.

And for locating a database record with it's rowid or autoincrement column a binary search is used which is far better than using the whole sequential parsing of a file with QDataSream to find your data.

Crag_Hack

This post is deleted!

Discover and share your #QtStories

Upcoming Forum Update April 22nd

Solved How to skip chunks using QDataStream?