What's the best way of creating a QString from a wchar_t array in Windows?



  • In the Windows API, wchar_t arrays are encoded in UTF-16.

    QString::fromWCharArray() is according to the documentation assuming UCS-2 if the size of a wchar_t is 2 bytes. QString::fromStdWString() on the other hand assumes UTF-16 if the size of a wchar_t is 2 bytes. I don't understand why Qt makes different assumptions for these two functions.

    Then we have QString::fromUtf16(). With a cast from (const wchar_t *) to (const ushort *) this seems to be a good candidate. However, the documentation points out that this function is slow and tells us to use QString(const QChar *, int) or QString(const QChar *) if possible. Is it possible to use these constructors with a wchar_t array?

    Bottom line: What's the best way of creating a QString from a wchar_t array in Windows?



  • First of all: UTF-16 and UCS-2 are one and the same things for any practical purpose. It's mainly "historical" reasons why the two different names exist.

    See also:
    http://www.unicode.org/faq/basic_q.html#14

    __

    Internally, Qt uses UTF-16 all the way. The difference between QString::fromWCharArray() and QString::QString::fromUtf16() is that the former takes an array of "wchar_t" and the latter takes an array of "unsigned short". On Windows it doesn't matter much what you use, because here "wchar_t" is a 16-Bit type and it's commonly used to store UTF-16 strings. However, on other platforms, "wchar_t" may be a 32-Bit type and then would only be used for UTF-32 strings. At the same time, "unsigned short" is a 16-Bit type on pretty much any platform nowadays. So this is what you would use to store UTF-16 strings on those platforms where "wchar_t" is 32-Bit in size.

    To make things even more complex, C++ introduces std::string and std::wstring classes, which essentially wrap plain "char" and "wchar_t" C-style strings, respectively. So in case you have a C++ std::wstring instead of a plain C-style wchar_t array, you would use QString::fromStdWString().



  • If the 16-bit code units found in the wchar_t array are copied directly to the internal buffer in QString by QString::fromWCharArray() the documentation should be updated to say that UTF-16 is assumed if the size of wchar_t is 2 bytes.

    If Qt really is assuming UCS-2, I would expect that invalid Unicode characters (such as the UTF-16 surrogates) are replaced by the replacement character (U+FFFD). If not, Qt is really assuming UTF-16 and the documentation should say so.



  • Hi, fromWCharArray /fromStdWString/fromUtf16 do the same thing for Windows.

    @
    inline QString QString::fromWCharArray(const wchar_t *string, int size)
    {
    return sizeof(wchar_t) == sizeof(QChar) ? fromUtf16(reinterpret_cast<const ushort *>(string), size)
    : fromUcs4(reinterpret_cast<const uint *>(string), size);
    }

    inline QString QString::fromStdWString(const std::wstring &s)
    { return fromWCharArray(s.data(), int(s.size())); }
    @



  • Ok, thanks! The source is always the best source! ;-) I should have checked it myself. I have created "QTBUG-35287":https://bugreports.qt-project.org/browse/QTBUG-35287 that points out the misleading documentation.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.