Problem reading data into a struct object
-
I am trying to read raw binary data from a file into a struct object and am getting some garbage but I can't work out why.
I am developing a utility in Qt 6.03 to work with shapefiles. I have been working with shapefiles and other GIS formats for more than 20 years but this problem has me stumped.
As I was getting errors running my QT widget application I wrote a simple console app just to test. Basically, it just opens the shapefile and reads the header info, or rather it doesn't. I didn't get Qt errors on reading the file but I had some garbage in the object.
Interestingly, I got exactly the same errors when I created the same console app using VS2019 and Embarcadero 10.3.3
However, if I compiled with the Borland 'classic' compiler, no errors!I have housands of shapefiles on my system and every one I try, fails, no matter if I created it or a 3rd party
Here is the Qt console app.#include <QCoreApplication> #include <iostream> #include <fstream> #include <QFile> #include <QDataStream> #include "structs.h" using namespace std; int main(int argc, char *argv[]) { QCoreApplication a(argc, argv); //************** using a *.vec file created by me **************************** VFC_vector_file_header *vfh = new VFC_vector_file_header; VFC_polygon_header *vph = new VFC_polygon_header; ifstream invec("Dinangat.vec", ios::binary); if(!invec){ cout << "Could not open vector file" << endl; return 1; } invec.read((char *)vfh, sizeof(VFC_vector_file_header)); invec.read((char *)vph, sizeof(VFC_polygon_header)); invec.close(); cout << "File Header" << endl << endl; cout << vfh->tvf_magic << endl; cout << vfh->numrecs << endl << endl; cout << "Polygon header" << endl << endl; cout << vph->numpts << endl; cout << vph->nbound << endl; cout << vph->sbound << endl; cout << vph->wbound << endl; cout << vph->ebound << endl; QString nbound = QString::number(vph->nbound); QString sbound = QString::number(vph->sbound); QString wbound = QString::number(vph->wbound); QString ebound = QString::number(vph->ebound); qInfo() << "North boundary" << nbound; qInfo() << "South boundary" << sbound; qInfo() << "West boundary" << wbound; qInfo() << "East boundary" << ebound; //********** using a shapefile created by a 3rd party ********************************* shp_header *sh = new shp_header; ifstream inshp("Dinangat.shp", ios::binary); if(!inshp){ cout << "Could not open shapefile" << endl; return 1; } int x= sizeof(shp_header); cout << "shp_header is : " << x << "bytes" << endl; inshp.read((char *)sh, sizeof(shp_header)); inshp.close(); cout << "Shapefile data" << endl << endl; cout << "File code : " << sh->fil_code << endl; cout << "File length : " << sh->fil_len << endl; cout << "File version : " << sh->fil_ver << endl; cout << "xmax : " << sh->xmax << endl; cout << "xmax : " << sh->xmin << endl; cout << "xmax : " << sh->ymax << endl; cout << "xmax : " << sh->ymin << endl; delete vfh; delete vph; delete sh; return a.exec(); }The .vec format was created by me for another use but it still uses a mixture of doubles ints and char strings so was a useful comparison. The .vec file worked perfectly.
the header files:code_text#ifndef STRUCTS_H #define STRUCTS_H struct shp_header{ int fil_code; int unused1; int unused2; int unused3; int unused4; int unused5; int fil_len; int fil_ver; //1000 little endian (03e8) int shp_typ; //little endian double xmin; //Little endian doubles double ymin; //Little endian doubles double xmax; //Little endian doubles double ymax; //Little endian doubles double zmin; //Little endian doubles - unused double zmax; //Little endian doubles - unused double mmin; //Little endian doubles - unused double mmax; //Little endian doubles - unused }; //struct tile_vector_header struct VFC_vector_file_header{ char tvf_magic[16]; int numrecs; char pad[12]; }; //32 bytes struct VFC_polygon_header{ int recno; int numpts; //********************************************** //bounding box for use in creating shapefile grid, or mapinfo tab? double nbound; double sbound; double wbound; double ebound; double v_rwidth; //********************************************** int s_type; //because shapefiles have it - maybe for the future? char v_icao[8]; char v_apt_type[6]; char v_feature[9]; char v_name_ID[24]; // maybe need a name or other ID? char v_tile[8]; char pad[85]; //future proofing :-) }; //now 192 butes - stillThe results I get are that the file is read correctly up to the int shp_typ member and I get garbage thereafter. This is the same on all compilers except borland classic, as mentioned.
If I split the struct into 2 parts, i.e. up to the int shp_typ member and then another struct for the doubles and then read them sequentially, everything is correct.
I mostly create shapefiles programmatically from other data and have rarely had call to read them into any app that I have written. These all work and can be viewed in any GIS.
One final point that I don't understand: When I read the size of shp_header x gives 104 bytes when it should be 100. This happens on all compilers except Borland classic, which gives 100.
I was originally using QDataStream, QFile etc. - strictly all Qt components when I first found the problem.
I hope someone can help -
I am trying to read raw binary data from a file into a struct object and am getting some garbage but I can't work out why.
I am developing a utility in Qt 6.03 to work with shapefiles. I have been working with shapefiles and other GIS formats for more than 20 years but this problem has me stumped.
As I was getting errors running my QT widget application I wrote a simple console app just to test. Basically, it just opens the shapefile and reads the header info, or rather it doesn't. I didn't get Qt errors on reading the file but I had some garbage in the object.
Interestingly, I got exactly the same errors when I created the same console app using VS2019 and Embarcadero 10.3.3
However, if I compiled with the Borland 'classic' compiler, no errors!I have housands of shapefiles on my system and every one I try, fails, no matter if I created it or a 3rd party
Here is the Qt console app.#include <QCoreApplication> #include <iostream> #include <fstream> #include <QFile> #include <QDataStream> #include "structs.h" using namespace std; int main(int argc, char *argv[]) { QCoreApplication a(argc, argv); //************** using a *.vec file created by me **************************** VFC_vector_file_header *vfh = new VFC_vector_file_header; VFC_polygon_header *vph = new VFC_polygon_header; ifstream invec("Dinangat.vec", ios::binary); if(!invec){ cout << "Could not open vector file" << endl; return 1; } invec.read((char *)vfh, sizeof(VFC_vector_file_header)); invec.read((char *)vph, sizeof(VFC_polygon_header)); invec.close(); cout << "File Header" << endl << endl; cout << vfh->tvf_magic << endl; cout << vfh->numrecs << endl << endl; cout << "Polygon header" << endl << endl; cout << vph->numpts << endl; cout << vph->nbound << endl; cout << vph->sbound << endl; cout << vph->wbound << endl; cout << vph->ebound << endl; QString nbound = QString::number(vph->nbound); QString sbound = QString::number(vph->sbound); QString wbound = QString::number(vph->wbound); QString ebound = QString::number(vph->ebound); qInfo() << "North boundary" << nbound; qInfo() << "South boundary" << sbound; qInfo() << "West boundary" << wbound; qInfo() << "East boundary" << ebound; //********** using a shapefile created by a 3rd party ********************************* shp_header *sh = new shp_header; ifstream inshp("Dinangat.shp", ios::binary); if(!inshp){ cout << "Could not open shapefile" << endl; return 1; } int x= sizeof(shp_header); cout << "shp_header is : " << x << "bytes" << endl; inshp.read((char *)sh, sizeof(shp_header)); inshp.close(); cout << "Shapefile data" << endl << endl; cout << "File code : " << sh->fil_code << endl; cout << "File length : " << sh->fil_len << endl; cout << "File version : " << sh->fil_ver << endl; cout << "xmax : " << sh->xmax << endl; cout << "xmax : " << sh->xmin << endl; cout << "xmax : " << sh->ymax << endl; cout << "xmax : " << sh->ymin << endl; delete vfh; delete vph; delete sh; return a.exec(); }The .vec format was created by me for another use but it still uses a mixture of doubles ints and char strings so was a useful comparison. The .vec file worked perfectly.
the header files:code_text#ifndef STRUCTS_H #define STRUCTS_H struct shp_header{ int fil_code; int unused1; int unused2; int unused3; int unused4; int unused5; int fil_len; int fil_ver; //1000 little endian (03e8) int shp_typ; //little endian double xmin; //Little endian doubles double ymin; //Little endian doubles double xmax; //Little endian doubles double ymax; //Little endian doubles double zmin; //Little endian doubles - unused double zmax; //Little endian doubles - unused double mmin; //Little endian doubles - unused double mmax; //Little endian doubles - unused }; //struct tile_vector_header struct VFC_vector_file_header{ char tvf_magic[16]; int numrecs; char pad[12]; }; //32 bytes struct VFC_polygon_header{ int recno; int numpts; //********************************************** //bounding box for use in creating shapefile grid, or mapinfo tab? double nbound; double sbound; double wbound; double ebound; double v_rwidth; //********************************************** int s_type; //because shapefiles have it - maybe for the future? char v_icao[8]; char v_apt_type[6]; char v_feature[9]; char v_name_ID[24]; // maybe need a name or other ID? char v_tile[8]; char pad[85]; //future proofing :-) }; //now 192 butes - stillThe results I get are that the file is read correctly up to the int shp_typ member and I get garbage thereafter. This is the same on all compilers except borland classic, as mentioned.
If I split the struct into 2 parts, i.e. up to the int shp_typ member and then another struct for the doubles and then read them sequentially, everything is correct.
I mostly create shapefiles programmatically from other data and have rarely had call to read them into any app that I have written. These all work and can be viewed in any GIS.
One final point that I don't understand: When I read the size of shp_header x gives 104 bytes when it should be 100. This happens on all compilers except Borland classic, which gives 100.
I was originally using QDataStream, QFile etc. - strictly all Qt components when I first found the problem.
I hope someone can help@Colins2
I have not looked at your code in detail. But, am I right, are you making assumptions aboutstructmember alignment across different compilers/OSes/architectures? (And some compilers havestruct-packing/alignment options when compiling.)You seem to be looking for a 4 byte discrepancy. I note you have 9
ints, an odd number. Plus maybe you are x64? I wonder whether some compilers/architectures choose to align the nextdoubleon an 8-byte border? I would print out&shp_header.shp_typversus&shp_header.xminon the different compilers/architectures? Insert an extraint unused6between these two, does that then make the sizes the same? -
@Colins2
I have not looked at your code in detail. But, am I right, are you making assumptions aboutstructmember alignment across different compilers/OSes/architectures? (And some compilers havestruct-packing/alignment options when compiling.)You seem to be looking for a 4 byte discrepancy. I note you have 9
ints, an odd number. Plus maybe you are x64? I wonder whether some compilers/architectures choose to align the nextdoubleon an 8-byte border? I would print out&shp_header.shp_typversus&shp_header.xminon the different compilers/architectures? Insert an extraint unused6between these two, does that then make the sizes the same?@JonB
That's an interesting thought. I will try and insert an extra, non-existant, int and see what happens.
I haven't tried on other OSs, all the above was on Win10 and yes it is x64 but I believe the Borland Classic compiler is 32bit so maybe that's why it works? -
@Colins2
I have not looked at your code in detail. But, am I right, are you making assumptions aboutstructmember alignment across different compilers/OSes/architectures? (And some compilers havestruct-packing/alignment options when compiling.)You seem to be looking for a 4 byte discrepancy. I note you have 9
ints, an odd number. Plus maybe you are x64? I wonder whether some compilers/architectures choose to align the nextdoubleon an 8-byte border? I would print out&shp_header.shp_typversus&shp_header.xminon the different compilers/architectures? Insert an extraint unused6between these two, does that then make the sizes the same?@JonB
Inserting a dummy int had no effect, I still got garbage.
Interestingly, the size of shp_header was still reported as 104.
In VS2019, I found a setting to align structs. It was set as 'default' with options for 1, 2, 4, 8. Setting it to 1 byte solved the problem and, with the dummy removed, produced the correct result.Now I have to find the same setting in Qt, and Embarcadero. I used 1 byte instead of 4 because some files have single char values in the headers or data structures. I guess 4 bytes would have worked for this case though.
Many thanks for the useful pointer!
-
@JonB
Inserting a dummy int had no effect, I still got garbage.
Interestingly, the size of shp_header was still reported as 104.
In VS2019, I found a setting to align structs. It was set as 'default' with options for 1, 2, 4, 8. Setting it to 1 byte solved the problem and, with the dummy removed, produced the correct result.Now I have to find the same setting in Qt, and Embarcadero. I used 1 byte instead of 4 because some files have single char values in the headers or data structures. I guess 4 bytes would have worked for this case though.
Many thanks for the useful pointer!
@Colins2 said in Problem reading data into a struct object:
Inserting a dummy int had no effect, I still got garbage.
Interestingly, the size of shp_header was still reported as 104.I would have expected both/all to be reported as 104 rather than 100, as 104 is divisible by 8....
Now I have to find the same setting in Qt
This is not a Qt issue. It depends on which compiler you are using. Search for this. For example,
gccuses__attribute__((packed)), some use#pragma, so not necessarily a compiler command-line option. I would be wary/hesitant about using any global option which changes struct padding/alignment, as it may affect other libraries you use; try to set it only on your ownstructs.I used 1 byte instead of 4
Up to you, but if possible use an even number at least. Some architectures are slower at accessing
ints etc. if they are on an odd numbered boundary. At least be aware of this. -
@Colins2 said in Problem reading data into a struct object:
Inserting a dummy int had no effect, I still got garbage.
Interestingly, the size of shp_header was still reported as 104.I would have expected both/all to be reported as 104 rather than 100, as 104 is divisible by 8....
Now I have to find the same setting in Qt
This is not a Qt issue. It depends on which compiler you are using. Search for this. For example,
gccuses__attribute__((packed)), some use#pragma, so not necessarily a compiler command-line option. I would be wary/hesitant about using any global option which changes struct padding/alignment, as it may affect other libraries you use; try to set it only on your ownstructs.I used 1 byte instead of 4
Up to you, but if possible use an even number at least. Some architectures are slower at accessing
ints etc. if they are on an odd numbered boundary. At least be aware of this.@JonB
OK, thanks again. I understand what you mean about the boundaries and will bear that in mind when designing my own file structures. However, I read a lot of files designed by other people over which I have no control. Many files in the GIS world are sometimes decades old with structures to match! I try to just use the gcc compilers in Qt so will look up the attribute setting.