QString Turkish Character Problem
-
Hi,
I use Visual Studio for coding. I used the first piece of code below in my project, and the second piece of code I added in the project I created in Visual Studio for trial purposes. When I want to use Turkish characters , I encounter different characters in my own project. In the project I use for trial purposes, 'QTextCodec::codecForLocale()->toUnicode(data.c_str());' I am getting the correct result for . What is the cause of this problem? How can I solve it?Project Code:
std::string dataString = "üöçİış"; QString dataQString = QTextCodec::codecForLocale()->toUnicode(dataString .c_str()); QString dataQString2 = QString::fromStdString(dataString .c_str()); QString dataQString3 = QString::fromLatin1(dataString .c_str()); QString dataQString4 = QString::fromLocal8Bit(dataString .c_str()); qDebug() << dataQString; qDebug() << dataQString2; qDebug() << dataQString3; qDebug() << dataQString4;
Output :
Trial Code :
std::string data = "üöçİış"; QString dataQString = QTextCodec::codecForLocale()->toUnicode(data.c_str()); QString dataQString2 = QString::fromStdString(data.c_str()); QString dataQString3 = QString::fromLatin1(data.c_str()); QString dataQString4 = QString::fromLocal8Bit(data.c_str()); ui.labeltoUnicode->setText(dataQString); ui.labelfromStdString->setText(dataQString2); ui.labelfromLatin1->setText(dataQString3); ui.labelfromLocal8Bit->setText(dataQString4);
Output:
-
@Ceng0 said in QString Turkish Character Problem:
std::string data = "üöçİış";
Places these characters encoded to bytes using your system's locale in the string
There's a reasonable chance this is the Windows 1254 8-bit encoding for Turkish machines
Bytes in hex: FC F6 E7 DD FD FEQString dataQString = QTextCodec::codecForLocale()->toUnicode(data.c_str());
Makes an educated guess at your system's locale, converts the bytes encoded in that locale into unicode equivalents, places them in the string. Will usually match data.
QString dataQString2 = QString::fromStdString(data.c_str());
From the docs: The given string is assumed to be encoded in UTF-8, and is converted to QString using the fromUtf8() function.
If it was not encoded as UTF-8 in the first place this risks corrupting the string.
Your string as UTF-8 should be bytes: c3 bc c3 b6 c3 a7 c4 b0 c4 b1 c5 9fQString dataQString3 = QString::fromLatin1(data.c_str());
The given string is assumed to be encoded in Latin-1 (ISO8859-1, Windows 1252)
If it was not encoded as ISO8859-1 in the first place this risks corrupting the string.
The last three characters in your string do not exist in ISO8859-1.QString dataQString4 = QString::fromLocal8Bit(data.c_str());
The given string is assumed to be encoded in the default 8-bit encoding for your locale. This is likely to match the encoding of your source file and string.
Have a play with this tool to see what the various encodings/decodings produce.
How you deal with this depends on exactly what you want in your std::string
Worst case, you can insert Unicode characters into C++ string literals using "\uxxxx":std::string data = "\u00fc\u00f6\u00e7\u0130\u0131\u015f";
Tedious but guaranteed to be encoding agnostic.
-
@JonB said in QString Turkish Character Problem:
how ridiculously complicated language encoding is in C++!
It's more a windows/msvc problem using anything else but utf-8 for the source files and even during runtime.