Special characters in strings stored in a QStringList
I have an application that gets a list of names from a text file and store those names in a QStringList. Afterwards, I use that list to access certain items of a QHash that have the names stored in my QStringList as their keys. The problem I am having is that some of the names might have special characters such as é, à, ù, etc.
So when this is the case I realized that when a name such as "Jérémy" for example, this name gets stored in the QStringList as "J?r?my" so when I want to access the item which has as key "Jérémy", my QHash it doesn't find the item.
How can I make the QStringList store the names with special characters correctly ? When reading from the text file to the QStringList I am doing:
file.open(QIODevice::ReadOnly | QIODevice::Text);
// then in a loop while not at the end of the text file..
QString line = in.readLine();
Appreciate any help! Thanks.
you are reading the characters in as 8 bit character (ASCII). ASCII doesn't contain the special characters.
You would need to use unicode (UTF-16).
Have you confirmed that your file is actually saved in UTF-8 format? (There are different ways to encode 'é'. Old systems might not use UTF-8 as the default)
[quote author="Buckets" date="1378314355"]you are reading the characters in as 8 bit character (ASCII)... You would need to use unicode (UTF-16).[/quote]No, sergex is trying to read the characters as UTF-8. UTF-8 and UTF-16 are BOTH Unicode encodings. http://stackoverflow.com/questions/4655250/difference-between-utf-8-and-utf-16
If the file is encoded in ISO-8859-1, Windows 1252, or a similar 8-bit encoding then 'é' is a single byte 0xE9, and the entire string is :
4A E9 72 E9 6D 79 (hex)
J é r é m y
When (mis)interpreted as UTF-8 the 0xE9 byte is not legal and results in a placeholder character being inserted in the QString for the illegal character. A valid UTF-8 encoded version of "Jérémy" is
4A C3 A9 72 C3 A9 6D 79 (hex).
J é r é m y
Edit: The preview is a fixed-width font so thing line up, but the site uses a proportional fonts for the live page. The é characters correspond to the pair of bytes C3 A9
BTW: Buckets, ASCII is a 7-bit encoding and does not allow 'é' either.
Thanks for your replies. I checked and the file was actually being saved in ANSI.
For now I simply changed the codec to
and now special characters are working fine. Haven't been able to make it save the file in UTF-8.
If you're on Windows, you can open your file in Notepad, and click "File" -> "Save As..."
It will then give you a drop-down menu to choose your encoding.
Yes, but I mean I create a QFile, write to it and save it. Then afterwards I get the list of strings from that previously saved file.
But I can't seem to have it saved in UTF-8 encoding from Qt. It always is saved in ANSI..
QString specialString = ...;
Note that even if you are really using UTF-8 there are different methods to encode strings there. Basically there are "compatibility forms" which are basically "e with some strange mark on top" and "non-compatibility forms" which use two letters for the same glyph: "e" and "add strange mark to the last letter". You will need to make sure your strings are normalized to one form or another.