toLatin() method replaces apostrophe and double quotes with ?
-
Hi
I am using toLatin1 to convert a QString to QByteArray. The string contains apostrophe and double quotes. However on console the apostrophe and double quotes are changed to '?'
I am using Qt 6.4.2This is my code
QString str1 = "That’s good. He is going to “Canada”";qDebug() << "Unchanged:" << str1; qDebug() << "Latin1:" << str1.toLatin1(); qDebug() << "Utf8" << str1.toUtf8();
This is the output
Unchanged: "That’s good. He is going to “Canada”"
Latin1: "That?s good. He is going to ?Canada?"
Utf8 "That\xE2\x80\x99s good. He is going to \xE2\x80\x9C""Canada\xE2\x80\x9D"If I update and replace all the ’ with ' and “ ” with " then it works fine.
This is the file I have received from client and sadly I cannot update it. Any way that these can be displayed without modifying the input file -
Hi @nitingera,
The string contains apostrophe and double quotes.
That's actually subtly wrong. Your string does not contain an apostrophe (U+0027), but a right-single-quote (U+2019).
Although they may look very similar (or even identical) depending on your screen font, there is actually no Latin-1 representation for right-single-quote, so as per the QString::toLaitin1() docs:
The returned byte array is undefined if the string contains non-Latin1 characters. Those characters may be suppressed or replaced with a question mark.
The same goes for your left and right double-quotes.
Try, for example:
const QString str1 = QString::fromUtf8("That’s good. He is going to “Canada”"); qDebug().noquote() << "str1 Unchanged:" << str1; qDebug().noquote() << "str1 Latin1:" << str1.toLatin1(); qDebug().noquote() << "str1 Utf8" << str1.toUtf8(); const QString str2 = QString(str1) .replace(QString::fromUtf8("’"),QStringLiteral("'")) .replace(QString::fromUtf8("“"),QStringLiteral("\"")) .replace(QString::fromUtf8("”"),QStringLiteral("\"")); qDebug().noquote() << "str2 Unchanged:" << str2; qDebug().noquote() << "str2 Latin1:" << str2.toLatin1(); qDebug().noquote() << "str2 Utf8" << str2.toUtf8();
Output:
str1 Unchanged: That’s good. He is going to “Canada” str1 Latin1: That?s good. He is going to ?Canada? str1 Utf8 That’s good. He is going to “Canada” str2 Unchanged: That's good. He is going to "Canada" str2 Latin1: That's good. He is going to "Canada" str2 Utf8 That's good. He is going to "Canada"
Why do you want to convert to Latin-1, and assuming you do, what do you want to happen to those non-Latin-1 characters?
Cheers.
-
Hi
I am using toLatin1 to convert a QString to QByteArray. The string contains apostrophe and double quotes. However on console the apostrophe and double quotes are changed to '?'
I am using Qt 6.4.2This is my code
QString str1 = "That’s good. He is going to “Canada”";qDebug() << "Unchanged:" << str1; qDebug() << "Latin1:" << str1.toLatin1(); qDebug() << "Utf8" << str1.toUtf8();
This is the output
Unchanged: "That’s good. He is going to “Canada”"
Latin1: "That?s good. He is going to ?Canada?"
Utf8 "That\xE2\x80\x99s good. He is going to \xE2\x80\x9C""Canada\xE2\x80\x9D"If I update and replace all the ’ with ' and “ ” with " then it works fine.
This is the file I have received from client and sadly I cannot update it. Any way that these can be displayed without modifying the input file@nitingera said in toLatin() method replaces apostrophe and double quotes with ?:
Any way that these can be displayed without modifying the input file
What does "displayed" mean? Looks like you can display them without any trouble.
qDebug() << "Unchanged:" << str1;
That line of code displays it fine, does it not?
Any way that these can be displayed without modifying the input file
You could read in from the input file and then modify the in-memory string you read.
-
Hi @nitingera,
The string contains apostrophe and double quotes.
That's actually subtly wrong. Your string does not contain an apostrophe (U+0027), but a right-single-quote (U+2019).
Although they may look very similar (or even identical) depending on your screen font, there is actually no Latin-1 representation for right-single-quote, so as per the QString::toLaitin1() docs:
The returned byte array is undefined if the string contains non-Latin1 characters. Those characters may be suppressed or replaced with a question mark.
The same goes for your left and right double-quotes.
Try, for example:
const QString str1 = QString::fromUtf8("That’s good. He is going to “Canada”"); qDebug().noquote() << "str1 Unchanged:" << str1; qDebug().noquote() << "str1 Latin1:" << str1.toLatin1(); qDebug().noquote() << "str1 Utf8" << str1.toUtf8(); const QString str2 = QString(str1) .replace(QString::fromUtf8("’"),QStringLiteral("'")) .replace(QString::fromUtf8("“"),QStringLiteral("\"")) .replace(QString::fromUtf8("”"),QStringLiteral("\"")); qDebug().noquote() << "str2 Unchanged:" << str2; qDebug().noquote() << "str2 Latin1:" << str2.toLatin1(); qDebug().noquote() << "str2 Utf8" << str2.toUtf8();
Output:
str1 Unchanged: That’s good. He is going to “Canada” str1 Latin1: That?s good. He is going to ?Canada? str1 Utf8 That’s good. He is going to “Canada” str2 Unchanged: That's good. He is going to "Canada" str2 Latin1: That's good. He is going to "Canada" str2 Utf8 That's good. He is going to "Canada"
Why do you want to convert to Latin-1, and assuming you do, what do you want to happen to those non-Latin-1 characters?
Cheers.
-
@nitingera said in toLatin() method replaces apostrophe and double quotes with ?:
Any way that these can be displayed without modifying the input file
What does "displayed" mean? Looks like you can display them without any trouble.
qDebug() << "Unchanged:" << str1;
That line of code displays it fine, does it not?
Any way that these can be displayed without modifying the input file
You could read in from the input file and then modify the in-memory string you read.
@Chops Thanks for your reply.
I have received a file from client who is trying to import it in the application.
I am reading that file and the strings are being converted to Latin1 format for some processing before being displayed.
When it is converted to Latin1, then all such characters (right-single-quote, left and right double-quotes) are converted to question marks.
I am looking for a way to prevent them from being converted to question marks -
@Chops Thanks for your reply.
I have received a file from client who is trying to import it in the application.
I am reading that file and the strings are being converted to Latin1 format for some processing before being displayed.
When it is converted to Latin1, then all such characters (right-single-quote, left and right double-quotes) are converted to question marks.
I am looking for a way to prevent them from being converted to question marks@nitingera But why do you need to convert to Latin-1?
-
Hi @nitingera,
The string contains apostrophe and double quotes.
That's actually subtly wrong. Your string does not contain an apostrophe (U+0027), but a right-single-quote (U+2019).
Although they may look very similar (or even identical) depending on your screen font, there is actually no Latin-1 representation for right-single-quote, so as per the QString::toLaitin1() docs:
The returned byte array is undefined if the string contains non-Latin1 characters. Those characters may be suppressed or replaced with a question mark.
The same goes for your left and right double-quotes.
Try, for example:
const QString str1 = QString::fromUtf8("That’s good. He is going to “Canada”"); qDebug().noquote() << "str1 Unchanged:" << str1; qDebug().noquote() << "str1 Latin1:" << str1.toLatin1(); qDebug().noquote() << "str1 Utf8" << str1.toUtf8(); const QString str2 = QString(str1) .replace(QString::fromUtf8("’"),QStringLiteral("'")) .replace(QString::fromUtf8("“"),QStringLiteral("\"")) .replace(QString::fromUtf8("”"),QStringLiteral("\"")); qDebug().noquote() << "str2 Unchanged:" << str2; qDebug().noquote() << "str2 Latin1:" << str2.toLatin1(); qDebug().noquote() << "str2 Utf8" << str2.toUtf8();
Output:
str1 Unchanged: That’s good. He is going to “Canada” str1 Latin1: That?s good. He is going to ?Canada? str1 Utf8 That’s good. He is going to “Canada” str2 Unchanged: That's good. He is going to "Canada" str2 Latin1: That's good. He is going to "Canada" str2 Utf8 That's good. He is going to "Canada"
Why do you want to convert to Latin-1, and assuming you do, what do you want to happen to those non-Latin-1 characters?
Cheers.
@Paul-Colby
Thanks for your response. My aim to to process the string that I read from file which has right-single-quote, left and right double-quotes and convert them to QByteArray for processing and then display the output.
It is not mandatory to convert it to Latin1 but when I tried to convert it to UTF8 as well then also these characters were not properly displayed.
The exact same code works fine with Qt4 but not with Qt6 -
@nitingera But why do you need to convert to Latin-1?
-
@jsulm I want to convert it to QByteArray..
My issue is that even if I convert it to utf8, it still doesn't display right-single-quote, left and right double-quotes@nitingera One comment: do not trust qDebug() output in such cases! qDebug is only for debugging! Better use std::cout or a widget to display the text.
What is the encoding of the string you get? -
@nitingera One comment: do not trust qDebug() output in such cases! qDebug is only for debugging! Better use std::cout or a widget to display the text.
What is the encoding of the string you get? -
-
so I think that means qDebug() to console window does manage to show them?
By default (it can be overridden using qInstallMessageHandler) qDebug() messages end up going through the qDefaultMessageHandler(), which, when writing to a console (as opposed to various other outputs, like syslog), end up using QString::toLocal8Bit() like:
fprintf(stderr, "%s\n", formattedMessage.toLocal8Bit().constData());
(see stderr_message_handler for example)
And as per the QString::toLocal8Bit() docs:
Returns the local 8-bit representation of the string as a QByteArray. The returned byte array is undefined if the string contains characters not supported by the local 8-bit encoding.
On Unix systems this is equivalent to toUtf8(), on Windows the systems current code page is being used.
If this string contains any characters that cannot be encoded in the locale, the returned byte array is undefined. Those characters may be suppressed or replaced by another.So in my case it was fine to use
qDebug()
to demonstrate the differences in the strings before and after replacing various characters, because my local console handles Unicode with no problems, but as @nitingera wrote, that's not something you should rely on for user-facing output.Cheers.
Edit: If I remember correctly, older Qt versions used to use QString::qUtf8Printable() for qDebug() output, but as per the docs:
This is equivalent to str.toUtf8().constData().
So it ends up the same anyway :)
-
so I think that means qDebug() to console window does manage to show them?
By default (it can be overridden using qInstallMessageHandler) qDebug() messages end up going through the qDefaultMessageHandler(), which, when writing to a console (as opposed to various other outputs, like syslog), end up using QString::toLocal8Bit() like:
fprintf(stderr, "%s\n", formattedMessage.toLocal8Bit().constData());
(see stderr_message_handler for example)
And as per the QString::toLocal8Bit() docs:
Returns the local 8-bit representation of the string as a QByteArray. The returned byte array is undefined if the string contains characters not supported by the local 8-bit encoding.
On Unix systems this is equivalent to toUtf8(), on Windows the systems current code page is being used.
If this string contains any characters that cannot be encoded in the locale, the returned byte array is undefined. Those characters may be suppressed or replaced by another.So in my case it was fine to use
qDebug()
to demonstrate the differences in the strings before and after replacing various characters, because my local console handles Unicode with no problems, but as @nitingera wrote, that's not something you should rely on for user-facing output.Cheers.
Edit: If I remember correctly, older Qt versions used to use QString::qUtf8Printable() for qDebug() output, but as per the docs:
This is equivalent to str.toUtf8().constData().
So it ends up the same anyway :)
@Paul-Colby said in toLatin() method replaces apostrophe and double quotes with ?:
Edit: If I remember correctly, older Qt versions used to use QString::qUtf8Printable() for qDebug() output, but as per the docs:
This is equivalent to str.toUtf8().constData().
So it ends up the same anyway :)
qUtf8Printable() is still actually used with qDebug() ... but in a different place: Use it if you want to use the "printf-style" API of qDebug, which expects utf-8 encoded strings for %s. E.g.
QString str = "..."; qDebug("Output: %s", qUtf8Printable(str));
This is different from system printf, which expects the local 8 bit encoding by default, so you better use qPrintable()/.toLocal8Bit().constData():
QString str = "..."; printf(stdout, "Output: %s", qPrintable(str));
But yeah, there's still no guarantee that the printed string will actually also show up correctly if printed on console , as Windows has it's own limitations there ...