QTextCodec canEncode, what is the expected behavior?
-
I am testing a few string conversions for use with an external non-unicode program on Windows using Qt 5.4.1. I have set my windows non-unicode program locale to Japanese and testing the 2 different strings using the code below:
foreach(const QString& arg, arguments) { QTextCodec* codec = QTextCodec::codecForLocale(); QByteArray localizedArg = arg.toLocal8Bit(); if(QString::fromLocal8Bit(localizedArg) != arg) { qDebug() << arg << "codec->canEncode" << codec->canEncode(arg) << "QString::fromLocal8Bit(localizedArg) != arg"; } else { qDebug() << arg << "codec->canEncode" << codec->canEncode(arg); } }
The variable "arguments" contains the following two strings:
- d:/でアヒィン/1.10/Amber/新しいバンク 2.file
- d:/1.10/Amber/你你.file
The first string contains a mixture of English and Japanese characters and the second string contains a mixture of English and Chinese characters.
Running through the loop above produces the following result in the output window:
"d:/でアヒィン/1.10/Amber/新しいバンク 2.file" codec->canEncode true
"d:/1.10/Amber/你你.file" codec->canEncode true QString::fromLocal8Bit(localizedArg) != argIf characters are lost or incorrectly converted such that reversing the operation produces a different string, I would expect QTextCodec::canEncode to return false (i.e. if it's going to convert the foreign characters to "?").
Stepping through the code, I got into QTextCodec::canEncode which the return result is based on state.invalidChars == 0, and just before it, it invoked the QWindowsLocalCodec::convertFromUnicode which doesn't seem to do anything to the ConverterState passed in. Is this correct or a bug?
-
@Thuan_Firelight please search the forum, I remember a similar question some. I just dont remember the outcome.
-
@aha_1980 I did, and the closest one I found was this: https://forum.qt.io/topic/93921/unexpected-result-from-qtextcodec-canencode-qstring/7.
However, it seems the "solved" response was that "US-ASCII" is not recommended. I am not using "US-ASCII" however. The last response was from the OP and he reckon it is not restricted to "US-ASCII". I wanted to bump the thread, but the forum mechanism recommend I start a new thread as that one is quite old.
And there is also this bug report: https://bugreports.qt.io/browse/QTBUG-6925, the last comment state it was closed. So I checking what's the expected behavior before I comment further on the bug report to bump it up.
-
I would recommend you to place a comment and vote on QTBUG-6925, which was re-opened by the way.
If you can break down your problem to a minimal, compile and testable example, please attach it there also - it helps debugging and fixing the problem.
You should of course try that with the latest release (which is 5.12-RC by now) - maybe there is already some improvement for your case.