Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Special characters in strings stored in a QStringList



  • Hello,

    I have an application that gets a list of names from a text file and store those names in a QStringList. Afterwards, I use that list to access certain items of a QHash that have the names stored in my QStringList as their keys. The problem I am having is that some of the names might have special characters such as é, à, ù, etc.

    So when this is the case I realized that when a name such as "Jérémy" for example, this name gets stored in the QStringList as "J?r?my" so when I want to access the item which has as key "Jérémy", my QHash it doesn't find the item.

    How can I make the QStringList store the names with special characters correctly ? When reading from the text file to the QStringList I am doing:

    @
    QFile file(name);
    file.open(QIODevice::ReadOnly | QIODevice::Text);

    QTextStream in(&file);
    in.setCodec("UTF-8");

    // then in a loop while not at the end of the text file..
    QString line = in.readLine();
    myStrList.append(line);

    @

    Appreciate any help! Thanks.



  • you are reading the characters in as 8 bit character (ASCII). ASCII doesn't contain the special characters.

    You would need to use unicode (UTF-16).

    helpful links:
    "ASCII":http://en.wikipedia.org/wiki/ASCII
    "Unicode":http://en.wikipedia.org/wiki/Unicode


  • Moderators

    Hi,

    Have you confirmed that your file is actually saved in UTF-8 format? (There are different ways to encode 'é'. Old systems might not use UTF-8 as the default)

    [quote author="Buckets" date="1378314355"]you are reading the characters in as 8 bit character (ASCII)... You would need to use unicode (UTF-16).[/quote]No, sergex is trying to read the characters as UTF-8. UTF-8 and UTF-16 are BOTH Unicode encodings. http://stackoverflow.com/questions/4655250/difference-between-utf-8-and-utf-16



  • If the file is encoded in ISO-8859-1, Windows 1252, or a similar 8-bit encoding then 'é' is a single byte 0xE9, and the entire string is :
    @
    4A E9 72 E9 6D 79 (hex)
    J é r é m y
    @
    When (mis)interpreted as UTF-8 the 0xE9 byte is not legal and results in a placeholder character being inserted in the QString for the illegal character. A valid UTF-8 encoded version of "Jérémy" is
    @
    4A C3 A9 72 C3 A9 6D 79 (hex).
    J é r é m y
    @
    Edit: The preview is a fixed-width font so thing line up, but the site uses a proportional fonts for the live page. The é characters correspond to the pair of bytes C3 A9

    BTW: Buckets, ASCII is a 7-bit encoding and does not allow 'é' either.



  • Thanks for your replies. I checked and the file was actually being saved in ANSI.

    For now I simply changed the codec to

    @
    QTextStream in(&file);
    in.setCodec("Windows-1250");
    @

    and now special characters are working fine. Haven't been able to make it save the file in UTF-8.


  • Moderators

    If you're on Windows, you can open your file in Notepad, and click "File" -> "Save As..."

    It will then give you a drop-down menu to choose your encoding.



  • Yes, but I mean I create a QFile, write to it and save it. Then afterwards I get the list of strings from that previously saved file.
    But I can't seem to have it saved in UTF-8 encoding from Qt. It always is saved in ANSI..


  • Moderators

    @
    QString specialString = ...;
    QFile utfFile("myfile.txt");
    if (utfFile.open(QFile::WriteOnly|QFile::Text))
    {
    utfFile.write(specialString.toUtf8());
    utfFile.close();
    }
    @


  • Moderators

    Note that even if you are really using UTF-8 there are different methods to encode strings there. Basically there are "compatibility forms" which are basically "e with some strange mark on top" and "non-compatibility forms" which use two letters for the same glyph: "e" and "add strange mark to the last letter". You will need to make sure your strings are normalized to one form or another.


Log in to reply