Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. Mobile and Embedded
  4. [Solved] Localization problems with UTF8 Turkish text data
Forum Updated to NodeBB v4.3 + New Features

[Solved] Localization problems with UTF8 Turkish text data

Scheduled Pinned Locked Moved Mobile and Embedded
10 Posts 2 Posters 9.9k Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    Alicemirror
    wrote on 3 Oct 2011, 07:30 last edited by
    #1

    Hi to all,

    I have a application that manages UTF-8 json files (formerly text files) that are included as internal resources. Some of these includes special characters because of words in Turkish language.

    Opening the files in the Qt Creator editor or any other editor I read the files correctly and all the special characters are shown in the right way. The following points summarize the scenario:

    The json file is opened as a QString datastream by a C++ class exposed to the QML document. The console.log(class.datastream) shows the file with the correct characters. This means that the localization is managed in the right way: all the Turkish cities names have the right special characters.

    The datastream is converted to a json object using the eval js function. What I expect is that all the names are in the json dictionary that should be appended to a list for the user choice.

    function:
    @
    // sourceString is the QString datastream (verified as correct)
    function processJsonData(sourceString) {

    return eval('(' + sourceString + ')');
    

    }
    @
    As a matter of fact, this function didn't work. I became crazy to find the "parsing error" message received from the function searching for wrong data in the json text file. But all is correct.
    Opening the json file with a binary editor I see three unprintable bytes before the first json structure character (that is the first "{" character). These are 0xEF, 0xBB, 0xBF part of the UTF-8 encoding of the file.
    These characters are treated by the js for some unknown reason as wrong data. This is the reason that the QString can't be processed correctly.
    The demonstration is in the changes that I have done to the previous function to this new one:
    @
    function processJsonData(sourceString) {

    console.log("\n *** Remove invalid left " + sourceString.indexOf("{") +  " characters ");
    return eval('('
                + sourceString.slice(sourceString.indexOf("{"))
                + ')');
    

    }
    @
    The function above works as expected and the QString is parsed. This function is applied to all the json files used by the application and ony for UTF-8 json files including Turkish characters the log message reports
    @
    *** Remove invalid left 3 characters
    @
    that are the mentioned first three bytes.

    Showing the list of data in the application the Turkish characters are rendered in a wrong way with grpahic symbols, strange characters etc. But all the other parts of the file are shown correctly.

    Is there some special procedure I should do? I have no ideas on how to workaround to this problem.

    Enrico Miglino (aka Alicemirror)
    Balearic Dynamics
    Islas Baleares, Ibiza (Spain)
    www.balearicdynamics.com

    1 Reply Last reply
    0
    • D Offline
      D Offline
      dangelog
      wrote on 3 Oct 2011, 13:24 last edited by
      #2

      [quote]Opening the json file with a binary editor I see three unprintable bytes before the first json structure character (that is the first “{” character). These are 0xEF, 0xBB, 0xBF part of the UTF-8 encoding of the file.[/quote]

      Those bytes are just the "BOM":http://en.wikipedia.org/wiki/Byte_order_mark for UTF-8 encoding. How are you getting that string inside your program? For sure there's a fromUtf8 call missing somewhere.

      Software Engineer
      KDAB (UK) Ltd., a KDAB Group company

      1 Reply Last reply
      0
      • A Offline
        A Offline
        Alicemirror
        wrote on 3 Oct 2011, 15:38 last edited by
        #3

        Hi, peppe. I am sure that thre is something missing, my difficult is to focus the problem. Sigh.

        Enrico Miglino (aka Alicemirror)
        Balearic Dynamics
        Islas Baleares, Ibiza (Spain)
        www.balearicdynamics.com

        1 Reply Last reply
        0
        • D Offline
          D Offline
          dangelog
          wrote on 3 Oct 2011, 15:58 last edited by
          #4

          Well: how do you open and read that JSON file?

          Software Engineer
          KDAB (UK) Ltd., a KDAB Group company

          1 Reply Last reply
          0
          • A Offline
            A Offline
            Alicemirror
            wrote on 3 Oct 2011, 16:07 last edited by
            #5

            @
            if (!m_data.isNull()) {
            if (!QFile::exists(m_data))
            m_datastream.clear();
            else {
            QFile file(m_data);
            if (!file.open(QFile::ReadOnly))
            m_datastream.clear();
            else {
            QByteArray data = file.readAll();
            QTextCodec *codec = QTextCodec::codecForLocale();
            QString str = codec->toUnicode(data);
            m_datastream.append(str);

                        qDebug() << "AppData::getJson() created datastream";
            
                    }
                }
            }
            

            @

            This is the core function that with m_data checked for the file content, with the right path etc. Creates the QString m_datastream that is the string exposed to the QML code: as is the QString that is parsed in the js function.

            Enrico Miglino (aka Alicemirror)
            Balearic Dynamics
            Islas Baleares, Ibiza (Spain)
            www.balearicdynamics.com

            1 Reply Last reply
            0
            • D Offline
              D Offline
              dangelog
              wrote on 3 Oct 2011, 16:23 last edited by
              #6

              Line 10 is suspicious. The file seems to be UTF8 (it has a UTF8 BOM), so you should be using QString::fromUtf8. What if the locale codec isn't UTF8?

              Software Engineer
              KDAB (UK) Ltd., a KDAB Group company

              1 Reply Last reply
              0
              • A Offline
                A Offline
                Alicemirror
                wrote on 3 Oct 2011, 16:37 last edited by
                #7

                This is suspiciuout to me too, because I don 't know very good this part. so you suggest to change
                @
                QString str = codec->toUnicode(data)
                @
                in
                @
                QString str = codec->fromUTf8(data)
                @

                ? All the files are UTF8 btoh turkish and not. And the BOM is only on the Turkish-character files.

                Enrico Miglino (aka Alicemirror)
                Balearic Dynamics
                Islas Baleares, Ibiza (Spain)
                www.balearicdynamics.com

                1 Reply Last reply
                0
                • A Offline
                  A Offline
                  Alicemirror
                  wrote on 3 Oct 2011, 16:49 last edited by
                  #8

                  @
                  QByteArray data = file.readAll();
                  QTextCodec *codec = QTextCodec::codecForLocale();
                  QString str = codec->toUnicode(data);
                  m_datastream.append(str);
                  @
                  This piece was changed accordingly with your suggestion:
                  @
                  QByteArray data = file.readAll();
                  QString str = QString::fromUtf8(data); // Instead of the following two lines
                  // QTextCodec *codec = QTextCodec::codecForLocale();
                  // QString str = codec->toUnicode(data);
                  m_datastream.append(str);
                  @

                  Remain the general question if is possible to know the format of the input file or if it is best that I set a function parameter to decide what kind of encoding / decoding should be used.

                  Enrico Miglino (aka Alicemirror)
                  Balearic Dynamics
                  Islas Baleares, Ibiza (Spain)
                  www.balearicdynamics.com

                  1 Reply Last reply
                  0
                  • D Offline
                    D Offline
                    dangelog
                    wrote on 3 Oct 2011, 17:55 last edited by
                    #9

                    No. You must know the encoding in advance, or apply heuristics (like what file(1) does).

                    Software Engineer
                    KDAB (UK) Ltd., a KDAB Group company

                    1 Reply Last reply
                    0
                    • A Offline
                      A Offline
                      Alicemirror
                      wrote on 3 Oct 2011, 18:38 last edited by
                      #10

                      @peppe: many thanks for the support :)

                      Enrico Miglino (aka Alicemirror)
                      Balearic Dynamics
                      Islas Baleares, Ibiza (Spain)
                      www.balearicdynamics.com

                      1 Reply Last reply
                      0

                      1/10

                      3 Oct 2011, 07:30

                      • Login

                      • Login or register to search.
                      1 out of 10
                      • First post
                        1/10
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Users
                      • Groups
                      • Search
                      • Get Qt Extensions
                      • Unsolved