QJsonDocument::fromJson fails on "foreign" characters
-
I have boiled the problem down to this small program below.
When I run it, it is successful on the first text input, but fails on the second (which contains the norwegian characters Æ,Ø and Å).
I am guessing that I am doing something wrong. It seems unlikely that noone had complained loudly if this really was an error in QJsonDocument::fromJson...
As additional information, I am running this code on:
- CentOS release 6.8
- Linux 2.6.32-642.1.1.el6.x86_64
- gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)
- Qt 5.6
#include <QtCore/QCoreApplication> #include <QtCore/QJsonDocument> #include <iostream> void textMessageReceived(const QString &msg) { std::cerr << "msg = " << msg.toStdString() << std::endl; QJsonParseError error; // Create a Json document from text. Fails for foreign characters! QJsonDocument doc = QJsonDocument::fromJson(msg.toUtf8(), &error); if(doc.isNull()) std::cerr << "(Broken json), error code = " << error.errorString().toStdString() << std::endl; else std::cerr << "(Valid json) : " << doc.toJson().toStdString() << std::endl; } int main() { QString jsontext = "{\"string\": \"abc\"}"; textMessageReceived(jsontext); jsontext = "{\"string\": \"æøå\"}"; textMessageReceived(jsontext); return 0; }
When run, the program will output
msg = {"string": "abc"} (Valid json) : { "string": "abc" } msg = {"string": "æøå"} (Broken json), error code = invalid UTF8 string
-
@Per-Gunnar-Holm said:
jsontext = "{"string": "æøå"}";
Hej
Im wondering if it will be encoded correctly as UTF8
when inline in source file?I did load some json with ÆØÅ ( im dane) from
a utf8 file and didnt notice
anything/ errors.
But this file was already encoded as utf8.Maybe when inline text it becomes full unicode or something?
what happens if u dont do toUtf8() ? -
Thank you for your interest!
The text is inline in my sample program, but this just to keep the size of the problem to a minimum. In my real-life case I receive json text over (Qt) web socket. The behavior seems to be identical though.
I tried modifying to not use QString::toUtf8, but there is no change.
void textMessageReceived(const QString &msg) { : QByteArray ba (msg.toStdString().c_str()); // Create a Json document from text string. Fails for foreign characters! QJsonDocument doc = QJsonDocument::fromJson(ba, &error); : }
I have, by the way, verified that 'æ', 'ø' and 'å' get the correct unicode encoding. What separates them from the other characters is that they require 2 bytes. E.g:
's' has code 0x73
't' has code 0x74
but
'æ' has code 0xC3A6
'ø' has code 0xC3B8
'å' has code 0xC3A5When stopping with the debugger in QtCreator I can see that the unicode values are correct...
-
@Per-Gunnar-Holm said:
Hi, maybe something ruins the encoding on the way.
I wondering if
msg.toStdString().c_str()
handle UNICODE ?maybe
msg.toUTF8().toStdString().c_str()I cant test anything currently , so out of suggestions.
Im 99% sure it should handle UTF/unicode -
@mrjj said:
msg.toUTF8().toStdString().c_str()
Thanks again.
I tested the last suggestion too, but the contents of 'ba' are always the same.
So, this (ba) is the content that is being sent to QJsonDocument::fromJson.
Also, you can note that msg (QString) is not in utf8.ba "{"string": "æøå"}" QByteArray '{' 123 0x7b char '"' 34 0x22 char 's' 115 0x73 char 't' 116 0x74 char 'r' 114 0x72 char 'i' 105 0x69 char 'n' 110 0x6e char 'g' 103 0x67 char '"' 34 0x22 char ':' 58 0x3a char ' ' 32 0x20 char '"' 34 0x22 char 'ᅢ' -61/195 0xc3 char 'ᆭ' -90/166 0xa6 char 'ᅢ' -61/195 0xc3 char 'ᄌ' -72/184 0xb8 char 'ᅢ' -61/195 0xc3 char 'ᆬ' -91/165 0xa5 char '"' 34 0x22 char '}' 125 0x7d char msg "{"string": "æøå"}" QString & [0] '{' 123 0x007b QChar [1] '"' 34 0x0022 QChar [2] 's' 115 0x0073 QChar [3] 't' 116 0x0074 QChar [4] 'r' 114 0x0072 QChar [5] 'i' 105 0x0069 QChar [6] 'n' 110 0x006e QChar [7] 'g' 103 0x0067 QChar [8] '"' 34 0x0022 QChar [9] ':' 58 0x003a QChar [10] ' ' 32 0x0020 QChar [11] '"' 34 0x0022 QChar [12] 'æ' 230 0x00e6 QChar [13] 'ø' 248 0x00f8 QChar [14] 'å' 229 0x00e5 QChar [15] '"' 34 0x0022 QChar [16] '}' 125 0x007d QChar
-
can you replace
jsontext = "{\"string\": \"æøå\"}";
withjsontext =QString::fromWCharArray(L"{\"string\": \"æøå\"}");
or tryjsontext =QString::fromWCharArray( L"{\"string\": \"" L"\u00E6" L"\u00F8" L"\u00E5" L"\"}" );
What I mean is that the problem is not how you read the string but how you create it
-
@VRonin
Thank you for looking into it!
Unfortunately the result is the same in both cases:jsontext = "{\"string\": \"æøå\"}" msg = {"string": "æøå"} (Broken json), error code = invalid UTF8 string
This is a bit of a surprise to me, I thought the second option would force the characters down to single byte representation?
Here's the actual code I ran (first suggestion commented out):
int main() { QString jsontext = "{\"string\": \"abc\"}"; textMessageReceived(jsontext); // jsontext = QString::fromWCharArray(L"{\"string\": \"æøå\"}"); jsontext = QString::fromWCharArray(L"{\"string\": \"" L"\u00E6" L"\u00F8" L"\u00E5" L"\"}"); qDebug() << "jsontext =" << jsontext; textMessageReceived(jsontext); return 0; }
-
This works for me. Qt 5.5.1 on MSVC2013
#include <QtCore/QCoreApplication> #include <QtCore/QJsonDocument> #include <QDebug> void textMessageReceived(const QString &msg) { qDebug() << "msg = " << msg << '\n'; QJsonParseError error; // Create a Json document from text. Fails for foreign characters! QJsonDocument doc = QJsonDocument::fromJson(msg.toUtf8(), &error); if(doc.isNull()) qDebug() << "(Broken json), error code = " << error.errorString() << '\n'; else qDebug() << "(Valid json) : " << QString(doc.toJson()) << '\n'; } int main() { QString jsontext = "{\"string\": \"abc\"}"; textMessageReceived(jsontext); jsontext = QString::fromWCharArray( L"{\"string\": \"" L"\u00E6" L"\u00F8" L"\u00E5" L"\"}" ); textMessageReceived(jsontext); return 0; }
Output:
msg = "{\"string\": \"abc\"}" (Valid json) : "{\n \"string\": \"abc\"\n}\n" msg = "{\"string\": \"æøå\"}" (Valid json) : "{\n \"string\": \"æøå\"\n}\n"
-
Thanks again @VRonin !
That means this is platform dependent I suppose. As far as I can see there are no differences between your code and mine (except you use qDebug).
It would also explain why there hasn't been a torrent of complaints to Qt if this is a bug and not me making a mistake; probably not too many on my platform.
As stated initially I have Qt 5.6.0 and I'm on Linux (CentOS 6.8). -
Now I did :-)
Same result (unfortunately):msg = "{\"string\": \"abc\"}" (Valid json) : "{\n \"string\": \"abc\"\n}\n" msg = "{\"string\": \"æøå\"}" (Broken json), error code = "invalid UTF8 string"
I checked the encoding as instructed, and the "Text Encoding" window comes up with "UTF-8" high-lighted.
I am assuming this is OK?
-- Gunnar -
Today I have tested on virtual installations of Ubuntu 16.04 and CentOS-7, both using QT 5.6.1.
The test program passes without problems there, so this is definitely a CentOS-6 problem. -
@Per-Gunnar-Holm
well maybe Qt 5.6.1 has a bug on that distro.I hope Is it an option to use CentOs-7 instead :)
-
QByteArray ba (msg.toStdString().c_str());
Bug or no, the above line doesn't seem correct. You should enforce the required encoding (as @VRonin has done) instead of relying on the internal representation of
std::string
and/orQString
.
The above should be:QByteArray ba = msg.toUtf8();
If you need to output
QString
s to the standard streams, attach aQTextStream
to them instead of converting the objects tostd::string
:QTextStream cout(stdout); QTextStream cerr(stderr); QTextStream cin(stdin);
Kind regards.
-
Thanks @kshegunov !
The c_str() was just something we tested along the way!
The original code was// Create a Json document from text. Fails for foreign characters! QJsonDocument doc = QJsonDocument::fromJson(msg.toUtf8(), &error);
However, all the variations we/I have tried display the same problem (on CentOS 6.5, as we have discovered).