Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Special characters in strings stored in a QStringList
Qt 6.11 is out! See what's new in the release blog

Special characters in strings stored in a QStringList

Scheduled Pinned Locked Moved General and Desktop
9 Posts 5 Posters 7.4k Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    sergex
    wrote on last edited by
    #1

    Hello,

    I have an application that gets a list of names from a text file and store those names in a QStringList. Afterwards, I use that list to access certain items of a QHash that have the names stored in my QStringList as their keys. The problem I am having is that some of the names might have special characters such as é, à, ù, etc.

    So when this is the case I realized that when a name such as "Jérémy" for example, this name gets stored in the QStringList as "J?r?my" so when I want to access the item which has as key "Jérémy", my QHash it doesn't find the item.

    How can I make the QStringList store the names with special characters correctly ? When reading from the text file to the QStringList I am doing:

    @
    QFile file(name);
    file.open(QIODevice::ReadOnly | QIODevice::Text);

    QTextStream in(&file);
    in.setCodec("UTF-8");

    // then in a loop while not at the end of the text file..
    QString line = in.readLine();
    myStrList.append(line);

    @

    Appreciate any help! Thanks.

    1 Reply Last reply
    0
    • B Offline
      B Offline
      Buckets
      wrote on last edited by
      #2

      you are reading the characters in as 8 bit character (ASCII). ASCII doesn't contain the special characters.

      You would need to use unicode (UTF-16).

      helpful links:
      "ASCII":http://en.wikipedia.org/wiki/ASCII
      "Unicode":http://en.wikipedia.org/wiki/Unicode

      ba ba ba
      ba na na na

      1 Reply Last reply
      0
      • JKSHJ Offline
        JKSHJ Offline
        JKSH
        Moderators
        wrote on last edited by
        #3

        Hi,

        Have you confirmed that your file is actually saved in UTF-8 format? (There are different ways to encode 'é'. Old systems might not use UTF-8 as the default)

        [quote author="Buckets" date="1378314355"]you are reading the characters in as 8 bit character (ASCII)... You would need to use unicode (UTF-16).[/quote]No, sergex is trying to read the characters as UTF-8. UTF-8 and UTF-16 are BOTH Unicode encodings. http://stackoverflow.com/questions/4655250/difference-between-utf-8-and-utf-16

        Qt Doc Search for browsers: forum.qt.io/topic/35616/web-browser-extension-for-improved-doc-searches

        1 Reply Last reply
        0
        • C Offline
          C Offline
          ChrisW67
          wrote on last edited by
          #4

          If the file is encoded in ISO-8859-1, Windows 1252, or a similar 8-bit encoding then 'é' is a single byte 0xE9, and the entire string is :
          @
          4A E9 72 E9 6D 79 (hex)
          J é r é m y
          @
          When (mis)interpreted as UTF-8 the 0xE9 byte is not legal and results in a placeholder character being inserted in the QString for the illegal character. A valid UTF-8 encoded version of "Jérémy" is
          @
          4A C3 A9 72 C3 A9 6D 79 (hex).
          J é r é m y
          @
          Edit: The preview is a fixed-width font so thing line up, but the site uses a proportional fonts for the live page. The é characters correspond to the pair of bytes C3 A9

          BTW: Buckets, ASCII is a 7-bit encoding and does not allow 'é' either.

          1 Reply Last reply
          0
          • S Offline
            S Offline
            sergex
            wrote on last edited by
            #5

            Thanks for your replies. I checked and the file was actually being saved in ANSI.

            For now I simply changed the codec to

            @
            QTextStream in(&file);
            in.setCodec("Windows-1250");
            @

            and now special characters are working fine. Haven't been able to make it save the file in UTF-8.

            1 Reply Last reply
            0
            • JKSHJ Offline
              JKSHJ Offline
              JKSH
              Moderators
              wrote on last edited by
              #6

              If you're on Windows, you can open your file in Notepad, and click "File" -> "Save As..."

              It will then give you a drop-down menu to choose your encoding.

              Qt Doc Search for browsers: forum.qt.io/topic/35616/web-browser-extension-for-improved-doc-searches

              1 Reply Last reply
              0
              • S Offline
                S Offline
                sergex
                wrote on last edited by
                #7

                Yes, but I mean I create a QFile, write to it and save it. Then afterwards I get the list of strings from that previously saved file.
                But I can't seem to have it saved in UTF-8 encoding from Qt. It always is saved in ANSI..

                1 Reply Last reply
                0
                • JKSHJ Offline
                  JKSHJ Offline
                  JKSH
                  Moderators
                  wrote on last edited by
                  #8

                  @
                  QString specialString = ...;
                  QFile utfFile("myfile.txt");
                  if (utfFile.open(QFile::WriteOnly|QFile::Text))
                  {
                  utfFile.write(specialString.toUtf8());
                  utfFile.close();
                  }
                  @

                  Qt Doc Search for browsers: forum.qt.io/topic/35616/web-browser-extension-for-improved-doc-searches

                  1 Reply Last reply
                  0
                  • T Offline
                    T Offline
                    tobias.hunger
                    wrote on last edited by
                    #9

                    Note that even if you are really using UTF-8 there are different methods to encode strings there. Basically there are "compatibility forms" which are basically "e with some strange mark on top" and "non-compatibility forms" which use two letters for the same glyph: "e" and "add strange mark to the last letter". You will need to make sure your strings are normalized to one form or another.

                    1 Reply Last reply
                    0

                    • Login

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • Users
                    • Groups
                    • Search
                    • Get Qt Extensions
                    • Unsolved