How to write entire QVector to a binary file?
-
@J-Hilk I am not sure what you mean by
if this program is used more than once, you're going to destroy your HD/SSD very quickly!
Given that 1 million doubles are 8 million bytes, I think modern processors and disk drives can handle such speed easily.
-
@CJha
You may (well) know more than I, but can MatLab read and process 8MB of new data per second, at the same time as something else is producing it? And, separately, do you really generate 1 million new data points per second?Also, as @J-Hilk said, wouldn't sending a pipe stream (e.g. a socket?) be better than writing to file and reading back? Does Matlab accept incoming data elsewhere than in a file?
-
@jsulm It is highly compatible. Here are the links for fopen, fread, fseek. I all these I can specify the format, ByteOrder, size of data (such as int, double, etc), and quite a few other things.
I don't think Matlab is the restrictive thing here, I can read any type of binary file in Matlab as long as I know how it is written. -
@CJha
The code to write aQVector
to file in the way you want, as fast as possible in one blob not one-by-one, is given in e.g. https://www.qtcentre.org/threads/65713-Output-a-QVector-of-doubles-to-a-binary-file-(without-using-QDatastream)?p=289540#post289540 :qint64 bytesWritten = file.write(reinterpret_cast<const char*>(vec.constData()), sizeof(double) * vec.size());
EDIT I think you will want
reinterpret_cast<>
rather thanstatic_cast<>
here as shown in that post, so I have altered the code line to use that. -
@JonB No, Matlab is going to read it at a later time. When data is being generated it is just stored in a binary file for later use by Matlab. And yes, Matlab is slower but it doesn't matter if it takes 1 second or 1 day to read the file as the researchers can just start loading the file in the night and come back in the morning to work on it (many researchers wait for times like 24 to 36 hours for files to get processed).
And yes, I am generating data at 1 million doubles per second. I am using National Instruments and Measurement Computing DAQ boards, controlling both through Qt and C++ and these boards are capable of generating 1 million doubles per second.
-
@CJha said in How to write entire QVector to a binary file?:
@J-Hilk I am not sure what you mean by
if this program is used more than once, you're going to destroy your HD/SSD very quickly!
Given that 1 million doubles are 8 million bytes, I think modern processors and disk drives can handle such speed easily.
its not about the speed, its about the amount of times written into the cell, Samsung for examples says their ssd's are "built to handle 150 terabytes written" with, lets say 1 million points of double (8 bytes each) per second would mean your ssd is done for in roughly 200 days, instead of the approximated 10 years.
also you have to coordinate read and write access of the file, so that Matlab and your Qt Programm to not try to access the file at the same time with potential data loss etc
-
@J-Hilk That's a good point, but it's not the case for me as the data write and data read happens at different times. Also, SSD lifetime doesn't matter as these researchers have lots of funding and SSD is a cheap item for them. My job is to give them what they ask for, and if they ruin their SSD in 200 days that is up to them (of course I will tell them that it can ruin their SSD fast but that's all I can do).
-
@CJha
BTW. When you have gotten it working with thatfile.write()
, which is going to be as good as it gets. Since speed seems to be such an issue, and you're going to be doing ~1,000,000 points, and you goal is going to be to access the data array and write it out raw. Then my thought would be: why use a QtQVector<>
at all? For best efficiency/memory usage, would this be a case where simply creating a C++ array ofdouble
s of sufficient size and storing into that directly/writing out to file would be simpler than wrapping it inQVector<>
overheads: even if that is small, what's the point?And P.S.
If you stick withQVector<>
, do make sure you useQVector::resize
/reserve(int size)
appropriately early (once if possible), I think. What you do not want is to have theQVector
keep reallocating/moving existing data as your million points keep arriving.... -
@JonB I agree that a simple C++ array would be faster and easier as that is the format in which data is generated in the buffer from the acquisition device.
However, if I write the data to a file in the same thread (in the same callback function where the data is deposited in the buffer from the acquisition device or in a different function), then since writing takes a long time it blocks the entire thread, this (once in a while) blocks the callback function which is called each time the required number of data samples is generated by the acquisition device resulting in an error.
To solve this problem, I write data to a binary file in a different thread. Now, if I pass the address of the same buffer in which data is deposited then it defeats the purpose of having multiple threads as I am accessing the same buffer in which data is deposited from the acquisition device just from a different thread instead of the main one. To overcome this I write the incoming data from the acquisition device's buffer to a
QVector<double>
then send this vector over aQt::QueuedConnection
to my "Writer" thread and I write it there. I am not so good with C++ arrays and so I am not quite confident on how to achieve this without involvingQVector
in the process. If you have any idea on how can I simplify this process I will be very grateful :) -
@CJha said in How to write entire QVector to a binary file?:
then since writing takes a long time it blocks the entire thread
You could do double-buffering with two arrays :-)
-
@CJha
My simple answer would be: mutexes. How that compares to queued signals I do not know; I am not suggesting mutexes, only answering the question.QVector<double> then send this vector over a Qt::QueuedConnection
Wouldn't mind just seeing how you send it, do you use
const QVector<> &
? -
@jsulm I already use double buffering i.e. I assign twice the amount of memory for the buffer than needed. But I cannot use two separate buffers as these buffers are controlled by the C based specialized functions which are specific to the acquisition device. All I can do is assign the size of the memory to the buffer. The program flow is as follows:
- Assign buffer size
- Start acquisition
- C based function puts data in the buffer and alerts my application through a callback function
- I retrieve data from the buffer to a C array (I cannot retrieve directly to a vector as this step is also controlled by device-specific C function which only accepts a pointer to a C array)
- Now I can do whatever I want with acquired data
So, there is not much choice in terms of the buffer.
Regarding the use of the C array that I use to get data out of the buffer, it is generated on the heap and deleted at the end of the callback function in which I get the data from the buffer.
I could use two different vectors to store data and achieve so-called 'double buffering' from my application's point of view, and I have tried that. But in this case as well the thread is blocked for the time period of writing data to a file.
-
@JonB Yeah I could use mutexes, but I prefer Queued Connection as there is just one
QVector
to be sent to a different thread. I do not useconst QVector<double>&
because I want to depend on Qt's implicit sharing i.e. if theQVector<double>
is changed while I am still using the previousQVector
to write data in my binary file then it would not affect my "Writer" thread. If I would useconst QVector<double>&
then it would refer to the originalQVector
in the main thread and then I would have to useQMutex
to protect read and write operations. This is my function in theWriter
class which inheritsQObject
and is run in a different thread:void Writer::writeData(QVector<double> vec) { ++sweepCount_; // Increament the count to keep track of number of times data vector is written if(isBin_){ // If user selects file type as .bin for(int ii = 0; ii < vec.length(); ++ii) binOut_ << vec[ii]; // binOut_ is a QDataStrem, assigned to a file when the user clicks on Start button } else{ // if the user selects file type as .csv if(vec.length() > 1){ outStream_ << vec[0]; for(int ii = 1; ii < vec.length(); ++ii) outStream_ << seperator_ << vec[ii]; // seperator_ = ',' or ';' depending on QLocale outStream_ << '\n'; } } }
-
Hi,
In what format do you get the data in the callback ?
-
@CJha said in How to write entire QVector to a binary file?:
I do not use const QVector<double>& because I want to depend on Qt's implicit sharing i.e. if the QVector<double> is changed while I am still using the previous QVector to write data in my binary file then it would not affect my "Writer" thread
So that you know, when passing your QVector through Qt::QueuedConnection - which is the default and correct one across threads - your QVector will be copied auto automatically, there will be no share until write. There will be a copy inside your thread
-
@J-Hilk Thanks, one thing which I am not sure about: What would happen if I use
const QVector<double>&
instead ofQVector<double>
, will it still copy the data if the connection type isQt:QueuedConnection
or will it just copy the reference for the vector?