Converting a String from "encoded" utf8 to proper utf8
-
I remake this post because I messed up.
Here is a simpler example of the problem :
#include "QDebug" void test() { const QString strModel = "松本"; const QString strTest1 = "\xe6\x9d\xbe\xe6\x9c\xac"; if(strModel != strTest1) qDebug() << "Test1 fail"; QString strFromExternalLibrary = "\\xe6\\x9d\\xbe\\xe6\\x9c\\xac"; QString strTest2 = strFromExternalLibrary; if(strModel != strTest2) qDebug() << "Test2 fail"; }
The first test works, the second doesn't. The problem is that "ExternalLibrary give us the string from the second test instead of the first. We need to be able to convert the string of the second example so that it match strModel.
-
I remake this post because I messed up.
Here is a simpler example of the problem :
#include "QDebug" void test() { const QString strModel = "松本"; const QString strTest1 = "\xe6\x9d\xbe\xe6\x9c\xac"; if(strModel != strTest1) qDebug() << "Test1 fail"; QString strFromExternalLibrary = "\\xe6\\x9d\\xbe\\xe6\\x9c\\xac"; QString strTest2 = strFromExternalLibrary; if(strModel != strTest2) qDebug() << "Test2 fail"; }
The first test works, the second doesn't. The problem is that "ExternalLibrary give us the string from the second test instead of the first. We need to be able to convert the string of the second example so that it match strModel.
@bvieducasse
Your second string, as shown, uses\\xe6
in a C literal, which puts 4 characters\xe6
into the string. The first one uses\xe6
so puts one character with value hexe6
into the string. Is this what you mean? You would need to convert the string\xe6
into the hex byte valuee6
if you expect the second string to be the same as the first. -
A straightforward way to do this is to split strFromExternalLibrary by '\' . Then remove the trailing x, and use e..g QString::toInt(&ok, 16) to convert each substring into an int. This you can then write to a QByteArray (e.g. using QByteArray::append(char ch)), and convert the whole QByteArray to a QString by QString::fromUtf8(). With some error handling, of course.
-
A straightforward way to do this is to split strFromExternalLibrary by '\' . Then remove the trailing x, and use e..g QString::toInt(&ok, 16) to convert each substring into an int. This you can then write to a QByteArray (e.g. using QByteArray::append(char ch)), and convert the whole QByteArray to a QString by QString::fromUtf8(). With some error handling, of course.
@kkoehne
Just a comment. If you are going to remove those\x
s from the string, and then convert each character as hex withQString::toInt(&ok, 16)
, would it maybe be quicker to remove the\x
s, turn into aQByteArray
and useQByteArray text = QByteArray::fromHex()
to convert all the bytes in one go, avoiding all this splitting and byte-by-byte stuff? -
@kkoehne
:) I was just thinking of speed, if OP is going to be doing a lot of these, or the strings contain a lot of bytes. It seems natural to take advantage of the 2-byte-hex-sequences thatQByteArray::fromHex()
is designed to read, given that is what the input seems to be comprised of. -
I remember OP's previous post, in which the full string is "\xe6\x9d\xbe\xe6\x9c\xac....'s iPhone", so it is not a hex-like only string.
I think OP need to find every "\x" and read the next two characters and convert them, and copy those which are not.
Actually I believe Qt have already done that in its private code when reading ini by QSettings, but they are not exposed by public Apis. -
Thank you for your quick responses. I think I got something to work, though maybe there are ways to write something more optimized.
QByteArray resultString; for (auto i = 0; i <= strFromExternalLibrary.size(); i++) { auto binome = strFromExternalLibrary.midRef(i,2); auto checkStr = QString("\\x"); if(binome == checkStr) { auto bytes = strFromExternalLibrary.midRef(i+2, 2).toLatin1(); resultString += QByteArray::fromHex(bytes); i+=3; } else { resultString.append(strFromExternalLibrary[i].toLatin1()); } } qDebug() << QString(resultString); if(strModel == QString(resultString)) qDebug () << "It Works!!";
This works even for strmodel = "松本foo"
-
As I said about QSettings, I've checked about it.
To use its "unescape" function we have to use the public APIs to read, but sadly QSettings need a real ini file, not something in memory.
So this is not a good solution, just keep some record about my test :)QTemporaryFile file; if(file.open()) { file.write("string=@ByteArray("); file.write(bytearray_of_escaped_string); file.write(")\n"); file.close(); QSettings settings(file.fileName(), QSettings::IniFormat); qDebug() << settings.value("string").toString(); }