Struggling to stream byte array .toUtf8() out a TextStream
-
Suppose you have a string like this:
Filename: 222runㅁㄴㅇ (32323272756ee38581e384b4e38587 <<< UTF8 hex)
qDebug() << runfileName << runfileName.length() << runfileName.toUtf8() << runfileName.toUtf8().length();
will yield: "222run???" 9 "222run\xE3\x85\x81\xE3\x84\xB4\xE3\x85\x87" 15
I want to append the 15 bytes into a QTextStream and the below fails miserably:
TextStreaming << tr("Run Filename") << "," << runfileName.toUtf8() << "\n"
This appears to be due to the fact the << operator will take the QByteArray produced by .toUtf8() and will end up calling QString::fromUtf8(runfileName) and I push the 9 incorrect bytes vs. the 15 correct bytes I want!
What's the proper way to "push" a UTF8 series of bytes into a QTextStream?
Thanks,
-Rich -
@rhb327 said in Struggling to stream byte array .toUtf8() out a TextStream:
I want to append the 15 bytes into a QTextStream
Why? QTextStream takes QString, a.k.a, unicode wide char string, not bytes. There's no need to, also you shouldn't, encode it to a 8-bit string.
To set the encoding codec, use QTextStream::setCodec instead of encoding the string by yourself.
If it is for binary data storage, you may consider to use QDataStream. -
Thanks for the input. I just tested << runfileName and << runfileName.toUtf8() and both yield expected behavior on Linux. Prior post was on Win10. Also, the TextStream already had setCodec("UTF-8"). Thus, I think this might be a Desktop Win10 issue not an issue on Linux Desktop (tested for this response) and on my embedded Linux platform.
Not sure why Win10 would be different but per my hex editor it definitely appears to be.
Thanks,
-Rich -
@rhb327 said in Struggling to stream byte array .toUtf8() out a TextStream:
Not sure why Win10 would be different but per my hex editor it definitely appears to be.
On Linux, the default system codec is UTF-8. This is not the case on Windows.
I just tested << runfileName and << runfileName.toUtf8() and both yield expected behavior on Linux.
This is just a coincidence.
[EDIT: The following explanation is wrong, sorry. --JKSH]
Like @Bonnie said, QTextStream takes aQString
; it does not take aQByteArray
. When you pass aQByteArray
into it, your program first decodes the QByteArray into a QString using your system's default codec.So how does this affect your tests? Well, on Linux, your UTF8-encoded byte array was converted to a QString using a UTF-8 codec first, and then it is passed into QTextStream. Since the encoding and the codec match, everything works fine.On Window,s your UTF8-encoded byte array was converted to a QString using a non-UTF8 codec, so you get corrupted data.What's the proper way to "push" a UTF8 series of bytes into a QTextStream?
I agree with @Bonnie: You should not push bytes into QTextStream. You should push text into QTextStream.
-
Thanks for explaining why I see the difference on Linux vs. Win10. And I understand the << operator of QTextStream will process my runfileName.toUtf8() QByteArray through QString::fromUtf8() per the documentation and I'm guessing that's where the system default encoding you mention occurs and that this is a poor approach to accomplish what I desire which is a UTF8 string in the TextStream.
So what's still missing for me is:
- I have a QString with the correct data
- I configure a QTextStream with setCodec("UTF-8")
- I run TextStream << runfileName on Win10 and I don't see the 15-bytes I expect
So here I'm passing a QString and it's not correct, why? Even the original qDebug() in my first post shows that UTF8 encoding is not working with qDebug() << runfiIeName which I think you would say is because of Win10s default. But for the TextStream I've set the codec to UTF8 so why only 9-bytes in the text file with 3Fh bytes vs. the 15-byte UTF8 message I expect when I've set the codec?
Thanks,
-Rich -
@rhb327 said in Struggling to stream byte array .toUtf8() out a TextStream:
I configure a QTextStream with setCodec("UTF-8")
Can you double-check that it has been configued correctly? What does
stream.codec()->name()
return?This code gives me 15 bytes:
auto ba = QByteArray::fromHex("32323272756ee38581e384b4e38587"); auto str = QString::fromUtf8(ba); QFile fout("output.txt"); fout.open(QFile::Text|QFile::WriteOnly); QTextStream stream(&fout); qDebug() << stream.codec()->name(); // "System" stream.setCodec("UTF-8"); qDebug() << stream.codec()->name(); // "UTF-8" stream << str;
P.S. I realized that my previous explanation was wrong, sorry. Please ignore it.
-
Thanks for helping me. This is a hook I put at the top of the function back on Win10 right now and running 5.12.0:
void Utilities::exportRunFile(quint8 logFileType, QString runfileName) { qDebug() << "DEBUG: " << runfileName << runfileName.length() << runfileName.toUtf8() << runfileName.toUtf8().length(); QSaveFile tempFile; QTextStream txtStream; txtStream.setCodec("UTF-8"); tempFile.setFileName("C:/Users/richard.bair/Downloads/test.csv"); tempFile.open(QIODevice::WriteOnly); txtStream.setDevice(&tempFile); txtStream << tr("Run Filename") << "," << runfileName << "\n"; txtStream.flush(); tempFile.commit();
qDebug output is below...filename happens to have some Korean characters in it...just part of a quick test.
DEBUG: "222run???" 9 "222run\xE3\x85\x81\xE3\x84\xB4\xE3\x85\x87" 15
Here's the file in a hex editor and Notepad++:
-
Added extra debug I missed...
void Utilities::exportRunFile(quint8 logFileType, QString runfileName) { qDebug() << "DEBUG: " << runfileName << runfileName.length() << runfileName.toUtf8() << runfileName.toUtf8().length(); QSaveFile tempFile; QTextStream txtStream; txtStream.setCodec("UTF-8"); tempFile.setFileName("C:/Users/richard.bair/Downloads/test.csv"); tempFile.open(QIODevice::WriteOnly); txtStream.setDevice(&tempFile); txtStream << tr("Run Filename") << "," << runfileName << "\n"; txtStream.flush(); tempFile.commit(); qDebug() << txtStream.codec()->name();
@JKSH, I think you're on to something...why is System showing?
-
Check this out...
void Utilities::exportRunFile(quint8 logFileType, QString runfileName) { qDebug() << "DEBUG: " << runfileName << runfileName.length() << runfileName.toUtf8() << runfileName.toUtf8().length(); QSaveFile tempFile; QTextStream txtStream; qDebug() << txtStream.codec()->name(); txtStream.setCodec("UTF-8"); qDebug() << txtStream.codec()->name(); tempFile.setFileName("C:/Users/richard.bair/Downloads/test.csv"); tempFile.open(QIODevice::WriteOnly); txtStream.setDevice(&tempFile); qDebug() << txtStream.codec()->name(); txtStream << tr("Run Filename") << "," << runfileName << "\n"; txtStream.flush(); qDebug() << txtStream.codec()->name(); tempFile.commit(); qDebug() << txtStream.codec()->name();
-
@rhb327 said in Struggling to stream byte array .toUtf8() out a TextStream:
@JKSH, I think you're on to something...why is System showing?
I'm not sure. Maybe calling
setDevice()
resets the codec somehow?Try calling
setCodec()
aftersetDevice()
, or pass the device in the QTextSteram constructor, like the code in my previous post.