General question about arrays and I/o
-
Hi
Here is a more c++ / Qt versionint main() { QVector<int> data = {1, 2, 3, 4, 5}; // using a container and not c array QString filename{"e:/test.bin"}; // CHANGE ME! :) QFile file(filename); if (file.open(QFile::WriteOnly)) { QDataStream out(&file); out << data; // the << is an operator which you can define for you own types. QVector has one already file.close(); } data.clear(); // clear the array so we can see it works if (file.open(QFile::ReadOnly)) { QDataStream in(&file); in >> data; // the >> is an operator which you can define for you own types. QVector has one already } qDebug() << data; // qDebug understand QVector and can just print it return 0; }
As you can hopefully see its far less low level than FILE interface.
Moreover, the << and >> we use to save the data can be used with all normal types
(int, float, etc) and also with Qt classes so you can just stream them with no extra code.
The << and >> handles the bytes sizes for you and all yo u then need to take care of is doing save and load in same
order or it fails.It also allows for saving and loading object the exact same way if you give it the
QDataStream operators.
Like:class budget { float transportation, grocery, food, stationery; QString key; public: budget() {} friend QDataStream &operator <<(QDataStream &stream, const budget &myclass) { stream<< myclass.food; stream<< myclass.grocery; stream<< myclass.key; stream<< myclass.stationery; stream<< myclass.transportation; return stream; } friend QDataStream &operator >>(QDataStream &stream, budget &myclass) { stream >> myclass.food; stream >> myclass.grocery; stream >> myclass.key; stream >> myclass.stationery; stream >> myclass.transportation; return stream; } this allows you to stream your own class 100% like vector budget myBuget; out << myBuget;
Overall this makes the program easier to read and also far less error prone than handling it at byte level with stuff like
fwrite(&data, sizeof(int), sizeof(data), pFile);I hope i sold it to you :)
-
@mrjj ahaha not sold yet but very thanks for such detailed answer :)
Tomorrow I will try your code (it is already deep night in Saint-Petersburg).
The main questions that affect on my decision is the speed of read/write binary files of weigh about N*10 Gigabytes and the possibility to use threads... How do you think should I use Qt interface or low level functions (such as fwrite/fread?)? -
i/o is always slow, so using Qt classes should not hurt much. however, holding such large datasets in memory efficiently can be complicated.
Can you elaborate a bit more about your problem?
Regards
-
@Please_Help_me_D
I'll throw in a couple of (very slightly) controversial suggestions, just for your consideration, since you are talking about such large levels of I/O.-
The Qt, C++ and even C stdio I/O functions like
fread
/fwrite()
use an underlying extra buffer level between the disk data and your code access. If performance is critical, and if what you are doing is very simple (e.g. just sequential access), lower level functions are namedread()
&write()
(or, depending on OS/compiler,_read
/_write()
). Further OS-specific calls are also available for asynchronous I/O, which again might improve your particular situation. You would have to time on platforms to see how much difference these make. -
I must admit I have never used this, though I have often wanted to: there is the
mmap()
(#include <sys/mman.h>
) family of calls. This "maps" (areas of) disk files directly into memory, so you do not actually do any I/O, the data appears and is accessed just like an array of bytes in memory. Again, you would have to time.
I don't know what the experts here think about my two points.
Also, be aware that reading disk I/O is also a lot faster than writing it, especially (as I understand it) if you have an SSD, though I guess that would not apply to your large data, but again you could check timings.
First thing is to address @aha_1980's request for more information on what you are trying to achieve.
-
-
@Please_Help_me_D
Hi
The overhead from DataStream etc will be very minor if just streaming a big 10 GB memory block the same way you would with
the FILE interface.
So its hard to suggest what will be the best solution before we know how you have the data structured etc.Also what is the data ? 10 GB is massive :)
-
@mrjj I tried the example you gave with QVector and QFile and I liked it.
@aha_1980 @JonB So I try to describe a little more the problem.
I know Matlab pretty good and I started to learn C++ and Qt to write a program that performs mathematical operations on the data. So I'm going to read raw binary data file and store it in scientific HDF5 format (primary as a 2-dimensional array). Those files may weigh from N100 Megabytes till N100 Gigabytes. So when I read data and store in HDF5 format then I need to have access only to portions of that data. I never need to upload all the data in RAM at the same time.
In Matlab I worked with memory mapping (memmap function) technique but now I want to use HDF5 format wich is able to replace the need to use memory mapping. I'm afraid that on windows there is some difficulties with mmap.Is in there a way in Qt to generate a sequance of numbers without loops? For example if I want an array with numbers {1, 2, 3, 4, 5} I write:
int data = {1, 2, 3, 4, 5};
But if I want to generate integer numbers from n to N with step dn I would get:
int data = {n, n+dn, n+2*dn, ... , N};
And also is there a way to get access to several elements of an array. For example:
int data = {1, 2, 3, 4, 5};
I how can get 2, 3 and 4th elements of data without loop?
-
Hi
Good to hear. :)
ok so its HDF5 format
Do note there exits libraries to use that format from c++.
But if they provide benefits for your use case or not is hard to say.
But since you want to read in the data, you will need to use mem mapped files or similar
and you might be able to get that out of the box with a library.But if I want to generate integer numbers from n to N with step dn I would get:
int data = {n, n+dn, n+2*dn, ... , N};
Hmm. Nothing really springs to mind. Why are you against a loop ?And also is there a way to get access to several elements of an array. For example:
int data = {1, 2, 3, 4, 5};
I how can get 2, 3 and 4th elements of data without loop?
data[index] gives access. If you need to modify the value thenint &val = data[index];
val = 100; // will change the table value to 100
Do note that QFile also supports mem mapped files.
(QFile::map ) -
@mrjj Hi))
I installed official libraries from HDFGroup with Cpp libraries checked on while do CMake. And there also HDF5 cpp project but I don't undestand what this project do. Is it just provide simple interface to use HDF5...But since you want to read in the data, you will need to use mem mapped files or similar
and you might be able to get that out of the box with a library.Didn't undestand that... Do you mean that HDF5 libraries uses memory mapping or do I need to read big data with mem map? I read this staff HDF5 or memory mapping and since HDF5 is well known and actively used I decided to use HDF5 instead of memory mapping.
int data = {n, n+dn, n+2*dn, ... , N};
Hmm. Nothing really springs to mind. Why are you against a loop ?
Well I'm from Matlab and it taught me to avoid loops (because it is slow in Matlab) and I'm slightly uncofortable now when I use loops in case I could avoid it :)
I need to get known with memory mapping in Qt a little better. Do you know is Qt memory mapping works on Windows? Because a week ago I was trying to install MPICH (for cluster computation, just to try) and I could not because of some error connected with lack of memap on Windows or something...
-
- Didn't understand that..
I meant that using whatever HDF5 uses to allow reading those large files might just work out of the box and then maybe no need for your own memmap file or similar. Was just saying you need something extra to drive such large files and it seems HDF5 does give that via its chunked file design.
(loops)
Ahh, That way. Well, there is a thing with loops in c++/Qt.
If you fill very large array in main thread, it will lag your program's interface.
But besides that, loops are fast in c++. (generally speaking)(Qfile map)
QFile map function should also work in window as far as i know.
Windows does support it natively and i think Qfile map uses that.The https://github.com/ess-dmsc/h5cpp
provides a c++ wrapper for a c library.
This is often done to allow for object orientated programming with the
c library and maybe hide details behind more easy to use classes than raw C code.
you dont need to use the wrapper if you feel good with c code.
However,I dont have experience with HDF5 format but looking over the docs, it really seems the way to go as it should provide you with anything you need to make a c++ program that can consume and produce such giga files.
and the c api dont really look that bad
https://support.hdfgroup.org/ftp/HDF5/examples/examples-by-api/hdf5-examples/1_10/C/H5D/h5ex_d_chunk.c - Didn't understand that..
-
@mrjj one more question. I get an error: array subscript is not an integer:
int data[5] = {1, 2, 3, 4, 5}; int ind[3] = {0, 1, 2}; int data2 = data[ind]; // here is that error . **ind** is highlighted by red
I declare ind as integer but still can't get access to those elements of an array...
And here is similar problem expression is not determined by a constant. Failure caused by reading a variable beyond its lifetime:
std::string str_file = "C:\\Users\\Tasik\\Documents\\Qt_prj\\proba.bin"; int n = str_file.length(); char char_file[n]; // here is that error. It appears only when I launch the application
-
Hi,
ind
is not an integer, it's an array of 3 integers.Depending on your compiler you will have to allocate your char_file array on the heap using new and then delete when done with that array.
-
@SGaist Hello
Thank you for answer
So is there a way to extract few elements from an array at the same time without loop?Depending on your compiler you will have to allocate your char_file array on the heap using new and then delete when done with that array.
My compiler is MSVC 2017. Could you write an example of this?
-
@Please_Help_me_D said in General question about arrays and I/o:
So is there a way to extract few elements from an array at the same time without loop?
Not with plain C arrays.
But you can do this with QVector: https://doc.qt.io/qt-5/qvector.html#mid
There is something you can do without copying anything: an array is just a pointer to first element, so:int data[5] = {1, 2, 3, 4, 5}; int *data2 = &data[2]; // data2 is now [3, 4, 5].
Do you really need to copy to data2? You can simply have a variable "length" containing the length of the sub-array in data.
"My compiler is MSVC 2017. Could you write an example of this?":
char *char_file = new char[n]; // Allocate on the heap ... delete[] char_file; // Delete when not needed anymore
-
@jsulm thank you for the answer
The problem is that usually I have I know indexes are maybe like:int data[4] = {1, 2, 3, 4, 5}; int ind[3] = {4, 2, 3};
and then I need to get access to those elements like:
int data2[3] = data[ind];
Now I read about QVector, I hope it is able to do that.
You know both examples that you wrote seems to me don't work properly.
int data[5] = {1, 2, 3, 4, 5}; int *data2 = &data[2]; // data2 is now [3].
And:
int n = 5; char char_file = new char[n]; // error: cannot initialize a variable of type 'char' with an rvalue of type 'char *' delete[] char_file; // Delete when not needed anymore
Where can I read about '*' and '&' signs when using in such ways? What it gives?
-
@Please_Help_me_D said in General question about arrays and I/o:
seems to me don't work properly
In what way? &data[2] points to 3 in data, so data2[0] == 3, data2[1] == 4 and data2[2] == 5
Please read about pointers in C/C++:
// It must be *char_file not just char_file char *char_file = new char[n];
I edited my previous post as I forgot *
-
In what way? &data[2] points to 3 in data, so data2[0] == 3, data2[1] == 4 and data2[2] == 5
I attach the picture below. data2 is now is equal to 3 and that is it. Is it correct?
After I added * pointer the program works but seems to me that the length of char_file doesn't depend on n. If n=4 then length of char_file=32, n=5 then char_file=32. Is it ok?
-
@Please_Help_me_D said in General question about arrays and I/o:
data2 is now is equal to 3 and that is it. Is it correct?
Yes it is, you can treat a pointer as an array (actually in C/C++ an array is simply a pointer to first element of the array). So, data2[0] == 3, data[1] == 4...
Just doqDebug() << data2[1];
and see.
Regarding second question: this is debugger view. Your array is for sure 4 char in size. To verify do
char_file[4] = 1;
your app should crash.
-
@jsulm Yes that works:
int data[5] = {1, 2, 3, 4, 5}; int *data2 = &data[2]; // data2 is now [3, 4, 5]. qDebug() << data2[2];
But what's the magic behind that?:) I debug I can see that data2 has only a single number.
But this doesn't crash and I can see some output in terminal (it is not 100 but some letters or signs as I think it is char) even if I lauch the program not in debug mode:
int n = 2; char *char_file = new char[n]; // error: cannot initialize a variable of type 'char' with an rvalue of type 'char *' delete[] char_file; // Delete when not needed anymore char_file[5] = 100; std::cout << char_file[5];
Are there in Qt the possibility to use command line when the program stopped in debug mode? For example if it's stopped and I want to do something in real time (while the program is topped)? Like in Matlab command line in debug mode
-
@Please_Help_me_D said in General question about arrays and I/o:
Are there in Qt the possibility to use command line when the program stopped in debug mode? For example if it's stopped and I want to do something in real time (while the program is topped)? Like in Matlab command line in debug mode
No, this is a C++ compiled program (nothing to do with Qt), not Matlab/an interpreted language! You can print out values, and even at a pinch poke a value into a variable, but you can't start "telling" the debugger/program to go perform actions :)