Special characters in strings stored in a QStringList



  • Hello,

    I have an application that gets a list of names from a text file and store those names in a QStringList. Afterwards, I use that list to access certain items of a QHash that have the names stored in my QStringList as their keys. The problem I am having is that some of the names might have special characters such as é, à, ù, etc.

    So when this is the case I realized that when a name such as "Jérémy" for example, this name gets stored in the QStringList as "J?r?my" so when I want to access the item which has as key "Jérémy", my QHash it doesn't find the item.

    How can I make the QStringList store the names with special characters correctly ? When reading from the text file to the QStringList I am doing:

    @
    QFile file(name);
    file.open(QIODevice::ReadOnly | QIODevice::Text);

    QTextStream in(&file);
    in.setCodec("UTF-8");

    // then in a loop while not at the end of the text file..
    QString line = in.readLine();
    myStrList.append(line);

    @

    Appreciate any help! Thanks.



  • you are reading the characters in as 8 bit character (ASCII). ASCII doesn't contain the special characters.

    You would need to use unicode (UTF-16).

    helpful links:
    "ASCII":http://en.wikipedia.org/wiki/ASCII
    "Unicode":http://en.wikipedia.org/wiki/Unicode


  • Moderators

    Hi,

    Have you confirmed that your file is actually saved in UTF-8 format? (There are different ways to encode 'é'. Old systems might not use UTF-8 as the default)

    [quote author="Buckets" date="1378314355"]you are reading the characters in as 8 bit character (ASCII)... You would need to use unicode (UTF-16).[/quote]No, sergex is trying to read the characters as UTF-8. UTF-8 and UTF-16 are BOTH Unicode encodings. http://stackoverflow.com/questions/4655250/difference-between-utf-8-and-utf-16



  • If the file is encoded in ISO-8859-1, Windows 1252, or a similar 8-bit encoding then 'é' is a single byte 0xE9, and the entire string is :
    @
    4A E9 72 E9 6D 79 (hex)
    J é r é m y
    @
    When (mis)interpreted as UTF-8 the 0xE9 byte is not legal and results in a placeholder character being inserted in the QString for the illegal character. A valid UTF-8 encoded version of "Jérémy" is
    @
    4A C3 A9 72 C3 A9 6D 79 (hex).
    J é r é m y
    @
    Edit: The preview is a fixed-width font so thing line up, but the site uses a proportional fonts for the live page. The é characters correspond to the pair of bytes C3 A9

    BTW: Buckets, ASCII is a 7-bit encoding and does not allow 'é' either.



  • Thanks for your replies. I checked and the file was actually being saved in ANSI.

    For now I simply changed the codec to

    @
    QTextStream in(&file);
    in.setCodec("Windows-1250");
    @

    and now special characters are working fine. Haven't been able to make it save the file in UTF-8.


  • Moderators

    If you're on Windows, you can open your file in Notepad, and click "File" -> "Save As..."

    It will then give you a drop-down menu to choose your encoding.



  • Yes, but I mean I create a QFile, write to it and save it. Then afterwards I get the list of strings from that previously saved file.
    But I can't seem to have it saved in UTF-8 encoding from Qt. It always is saved in ANSI..


  • Moderators

    @
    QString specialString = ...;
    QFile utfFile("myfile.txt");
    if (utfFile.open(QFile::WriteOnly|QFile::Text))
    {
    utfFile.write(specialString.toUtf8());
    utfFile.close();
    }
    @


  • Moderators

    Note that even if you are really using UTF-8 there are different methods to encode strings there. Basically there are "compatibility forms" which are basically "e with some strange mark on top" and "non-compatibility forms" which use two letters for the same glyph: "e" and "add strange mark to the last letter". You will need to make sure your strings are normalized to one form or another.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.