Qt MS Access umlauts
-
Hi
I have an MS Access database with two tables: test, Test_utf8 (see images below)
The data in the column example of table test was entered directly in MS Access. The data in Test_utf8 was imported from a text file with utf-8 encoding.

The code is as follows:
QSqlDatabase db = QSqlDatabase::addDatabase("QODBC3"); QString connString = "DBQ=umlaute.mdb;Driver={Microsoft Access Driver (*.mdb,*.accdb)};FIL={MS Access};" QSqlQuery query; QString sql_statement = "SELECT example FROM test;" //or "SELECT example_utf8 FROM Test_utf8" db.setDatabaseName(connString); if (db.open()) { QFile file("output.txt"); if(file.open(QFile::WriteOnly)) { QTextStream out(&file); out.setCodec("windows-1252") // or "UTF-8" if Test_utf8 query.prepare(sql_statement); query.exec(); while (query.next()) { qDebug() << query.value(0).toByteArray(); out << query.value(0).toByteArray() << " | " << QString::fromUtf8(query.value(0).toByteArray()) << " | " ; out << QString::fromLatin1(query.value(0).toByteArray()) << " | "; out << query.value(0).toByteArray().toHex(' ') << Qt::endl; } } } else { qDebug() << "could not open database"; qDebug() << db.lastError(); } db.close();When I try to query the data I get different results depending on which table i am querying.
Example for the word "hellö"
When querying table test I get:
which is a replacement character and my guess is it indicates an encoding problem.With table Test_utf8 I get:
which corresponds to an "ö"I am however at a loss as to how to get the "ö" when querying table test. Any help, suggestions or ideas would by much appreciated.
Cheers
PS: I am using Qt 5.15.0 MinGW 64-bit, the 64-bit Microsoft Access Driver (*mdb,*accdb) and the database was created with 64-bit MS Access
-
@hskoglund using odbc2 did not make a difference. technically switching the unicode settings on my PC worked but a whole lot of other problems with other software arose. So I switched it back.
I did find this in the documentation (https://doc.qt.io/qt-6/qsqlquery.html):

So I moved the query to after if(db.open()), as shown below
QSqlDatabase db = QSqlDatabase::addDatabase("QODBC3"); QString connString = "DBQ=umlaute.mdb;Driver={Microsoft Access Driver (*.mdb,*.accdb)};FIL={MS Access};" QString sql_statement = "SELECT example FROM test;" //or "SELECT example_utf8 FROM Test_utf8" db.setDatabaseName(connString); if (db.open()) { QSqlQuery query; //moved query to here QFile file("output.txt"); if(file.open(QFile::WriteOnly)) { QTextStream out(&file); out.setCodec("windows-1252") // or "UTF-8" if Test_utf8 query.prepare(sql_statement); query.exec(); while (query.next()) { qDebug() << query.value(0).toByteArray(); out << query.value(0).toByteArray() << " | " << QString::fromUtf8(query.value(0).toByteArray()) << " | " ; out << QString::fromLatin1(query.value(0).toByteArray()) << " | "; out << query.value(0).toByteArray().toHex(' ') << Qt::endl; } } } else { qDebug() << "could not open database"; qDebug() << db.lastError(); } db.close();Now i get

when querying table test, which is correct.Thank you for your time and help!
-
@mki2 said in Qt MS Access umlauts:
When querying table test I get: 'hell\xEF\xBF\xBD" which is a replacement character and my guess is it indicates an encoding problem.
Those three bytes are the UTF-8 encoding of the Unicode U+FFFD used to replace an incoming character whose value is unknown or unrepresentable in Unicode. My guess is that the data in the column is encoded in Windows-1252, that is a bare 0xF6 byte to represent the 'ö' character.
A single 0xF6 byte does, indeed, make no sense if you try to interpret it as UTF-8.
With table Test_utf8 I get: "hell\xC3\xB6" which corresponds to an "ö"
This is the UTF-8 encoding on the Unicode code point U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS).
In your shot from Access of the UTF-8 column you see these two bytes as "Ã" and "¶" as that is what they represent in the Windows code page.
-
Thanks for the replies!
So it is an encoding issue.
I tried @hskoglund suggestion, but it doesn't change the output result. I still get:
adding the following into the while loop:
qDebug() << "output variant: " << query.value(0);already yields:

so i would need to specify the encoding before the query is executed? How could I do this?
As far as I am aware I can't specify an encoding, e.g. charset="Windows-1252" in the connection string for odbc and ms access.The actual ms access database I will have to work with will unfortunately be like table test and not like table Test_utf8. I just wanted to see what result I get when I am sure the data in the field is utf-8 encoded.
-
Hi, you could also try converting from -1252:
qDebug() << QString::fromLatin1(query.value(0).toByteArray())@hskoglund I tried that. But it doesn't seem to me that it solves the problem. I modified my code above to output a txt. here is what I get when querying table test

and this is what I get when I query table Test_utf8:

I think regarding Test_utf8 the output is as expected.
However, regarding table test the "information" is "lost" upon being read from the database? -
If you try a hex dump, say like:
qDebug() << query.value(0).toByteArray().toHex(' ');
can you see the Windows-1252 characters (i.e. the ones bigger than 0x7f)? -
Aha, now i recognize that � (ef bf bd). I remember for example this post which had more or less the same problem. I used a workaround (bypassing Qt's ODBC driver + talking directly to ODBC) I have the code somewhere but I ditched GitHub when Microsoft bought it so the link doesn't work.
Here's some stuff you could try:
Try using ODBC2 instead of 3:
QSqlDatabase::addDatabase("QODBC");Try changing the Unicode settings in your Windows PC
https://stackoverflow.com/a/69268839/233402 -
@hskoglund using odbc2 did not make a difference. technically switching the unicode settings on my PC worked but a whole lot of other problems with other software arose. So I switched it back.
I did find this in the documentation (https://doc.qt.io/qt-6/qsqlquery.html):

So I moved the query to after if(db.open()), as shown below
QSqlDatabase db = QSqlDatabase::addDatabase("QODBC3"); QString connString = "DBQ=umlaute.mdb;Driver={Microsoft Access Driver (*.mdb,*.accdb)};FIL={MS Access};" QString sql_statement = "SELECT example FROM test;" //or "SELECT example_utf8 FROM Test_utf8" db.setDatabaseName(connString); if (db.open()) { QSqlQuery query; //moved query to here QFile file("output.txt"); if(file.open(QFile::WriteOnly)) { QTextStream out(&file); out.setCodec("windows-1252") // or "UTF-8" if Test_utf8 query.prepare(sql_statement); query.exec(); while (query.next()) { qDebug() << query.value(0).toByteArray(); out << query.value(0).toByteArray() << " | " << QString::fromUtf8(query.value(0).toByteArray()) << " | " ; out << QString::fromLatin1(query.value(0).toByteArray()) << " | "; out << query.value(0).toByteArray().toHex(' ') << Qt::endl; } } } else { qDebug() << "could not open database"; qDebug() << db.lastError(); } db.close();Now i get

when querying table test, which is correct.Thank you for your time and help!

