What's the best way of creating a QString from a wchar_t array in Windows?
-
wrote on 30 Nov 2013, 18:52 last edited by
In the Windows API, wchar_t arrays are encoded in UTF-16.
QString::fromWCharArray() is according to the documentation assuming UCS-2 if the size of a wchar_t is 2 bytes. QString::fromStdWString() on the other hand assumes UTF-16 if the size of a wchar_t is 2 bytes. I don't understand why Qt makes different assumptions for these two functions.
Then we have QString::fromUtf16(). With a cast from (const wchar_t *) to (const ushort *) this seems to be a good candidate. However, the documentation points out that this function is slow and tells us to use QString(const QChar *, int) or QString(const QChar *) if possible. Is it possible to use these constructors with a wchar_t array?
Bottom line: What's the best way of creating a QString from a wchar_t array in Windows?
-
wrote on 1 Dec 2013, 02:02 last edited by
First of all: UTF-16 and UCS-2 are one and the same things for any practical purpose. It's mainly "historical" reasons why the two different names exist.
See also:
http://www.unicode.org/faq/basic_q.html#14__
Internally, Qt uses UTF-16 all the way. The difference between QString::fromWCharArray() and QString::QString::fromUtf16() is that the former takes an array of "wchar_t" and the latter takes an array of "unsigned short". On Windows it doesn't matter much what you use, because here "wchar_t" is a 16-Bit type and it's commonly used to store UTF-16 strings. However, on other platforms, "wchar_t" may be a 32-Bit type and then would only be used for UTF-32 strings. At the same time, "unsigned short" is a 16-Bit type on pretty much any platform nowadays. So this is what you would use to store UTF-16 strings on those platforms where "wchar_t" is 32-Bit in size.
To make things even more complex, C++ introduces std::string and std::wstring classes, which essentially wrap plain "char" and "wchar_t" C-style strings, respectively. So in case you have a C++ std::wstring instead of a plain C-style wchar_t array, you would use QString::fromStdWString().
-
wrote on 1 Dec 2013, 16:17 last edited by
If the 16-bit code units found in the wchar_t array are copied directly to the internal buffer in QString by QString::fromWCharArray() the documentation should be updated to say that UTF-16 is assumed if the size of wchar_t is 2 bytes.
If Qt really is assuming UCS-2, I would expect that invalid Unicode characters (such as the UTF-16 surrogates) are replaced by the replacement character (U+FFFD). If not, Qt is really assuming UTF-16 and the documentation should say so.
-
wrote on 2 Dec 2013, 01:59 last edited by
Hi, fromWCharArray /fromStdWString/fromUtf16 do the same thing for Windows.
@
inline QString QString::fromWCharArray(const wchar_t *string, int size)
{
return sizeof(wchar_t) == sizeof(QChar) ? fromUtf16(reinterpret_cast<const ushort *>(string), size)
: fromUcs4(reinterpret_cast<const uint *>(string), size);
}inline QString QString::fromStdWString(const std::wstring &s)
{ return fromWCharArray(s.data(), int(s.size())); }
@ -
wrote on 2 Dec 2013, 08:16 last edited by
Ok, thanks! The source is always the best source! ;-) I should have checked it myself. I have created "QTBUG-35287":https://bugreports.qt-project.org/browse/QTBUG-35287 that points out the misleading documentation.
1/5