Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Converting a String from "encoded" utf8 to proper utf8
QtWS25 Last Chance

Converting a String from "encoded" utf8 to proper utf8

Scheduled Pinned Locked Moved Solved General and Desktop
9 Posts 4 Posters 610 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B Offline
    B Offline
    bvieducasse
    wrote on 21 Sept 2022, 08:08 last edited by
    #1

    I remake this post because I messed up.

    Here is a simpler example of the problem :

    #include "QDebug"
    
    
    
    void test()
    {
        const QString strModel = "松本";
    
    
    
       const QString strTest1 = "\xe6\x9d\xbe\xe6\x9c\xac";
        if(strModel != strTest1)
            qDebug() << "Test1 fail";
    
    
    
       QString strFromExternalLibrary = "\\xe6\\x9d\\xbe\\xe6\\x9c\\xac";
        QString strTest2 = strFromExternalLibrary;
        if(strModel != strTest2)
            qDebug() << "Test2 fail";
    }
    

    The first test works, the second doesn't. The problem is that "ExternalLibrary give us the string from the second test instead of the first. We need to be able to convert the string of the second example so that it match strModel.

    J 1 Reply Last reply 21 Sept 2022, 08:15
    0
    • B bvieducasse
      21 Sept 2022, 08:08

      I remake this post because I messed up.

      Here is a simpler example of the problem :

      #include "QDebug"
      
      
      
      void test()
      {
          const QString strModel = "松本";
      
      
      
         const QString strTest1 = "\xe6\x9d\xbe\xe6\x9c\xac";
          if(strModel != strTest1)
              qDebug() << "Test1 fail";
      
      
      
         QString strFromExternalLibrary = "\\xe6\\x9d\\xbe\\xe6\\x9c\\xac";
          QString strTest2 = strFromExternalLibrary;
          if(strModel != strTest2)
              qDebug() << "Test2 fail";
      }
      

      The first test works, the second doesn't. The problem is that "ExternalLibrary give us the string from the second test instead of the first. We need to be able to convert the string of the second example so that it match strModel.

      J Offline
      J Offline
      JonB
      wrote on 21 Sept 2022, 08:15 last edited by JonB
      #2

      @bvieducasse
      Your second string, as shown, uses \\xe6 in a C literal, which puts 4 characters \xe6 into the string. The first one uses \xe6 so puts one character with value hex e6 into the string. Is this what you mean? You would need to convert the string \xe6 into the hex byte value e6 if you expect the second string to be the same as the first.

      1 Reply Last reply
      0
      • K Offline
        K Offline
        kkoehne
        Moderators
        wrote on 21 Sept 2022, 08:18 last edited by kkoehne
        #3

        A straightforward way to do this is to split strFromExternalLibrary by '\' . Then remove the trailing x, and use e..g QString::toInt(&ok, 16) to convert each substring into an int. This you can then write to a QByteArray (e.g. using QByteArray::append(char ch)), and convert the whole QByteArray to a QString by QString::fromUtf8(). With some error handling, of course.

        Director R&D, The Qt Company

        J 1 Reply Last reply 21 Sept 2022, 08:24
        2
        • K kkoehne
          21 Sept 2022, 08:18

          A straightforward way to do this is to split strFromExternalLibrary by '\' . Then remove the trailing x, and use e..g QString::toInt(&ok, 16) to convert each substring into an int. This you can then write to a QByteArray (e.g. using QByteArray::append(char ch)), and convert the whole QByteArray to a QString by QString::fromUtf8(). With some error handling, of course.

          J Offline
          J Offline
          JonB
          wrote on 21 Sept 2022, 08:24 last edited by JonB
          #4

          @kkoehne
          Just a comment. If you are going to remove those \xs from the string, and then convert each character as hex with QString::toInt(&ok, 16), would it maybe be quicker to remove the \xs, turn into a QByteArray and use QByteArray text = QByteArray::fromHex() to convert all the bytes in one go, avoiding all this splitting and byte-by-byte stuff?

          1 Reply Last reply
          3
          • K Offline
            K Offline
            kkoehne
            Moderators
            wrote on 21 Sept 2022, 08:27 last edited by
            #5

            @JonB , you're right, that's even easier :)

            Director R&D, The Qt Company

            J 1 Reply Last reply 21 Sept 2022, 08:29
            1
            • K kkoehne
              21 Sept 2022, 08:27

              @JonB , you're right, that's even easier :)

              J Offline
              J Offline
              JonB
              wrote on 21 Sept 2022, 08:29 last edited by JonB
              #6

              @kkoehne
              :) I was just thinking of speed, if OP is going to be doing a lot of these, or the strings contain a lot of bytes. It seems natural to take advantage of the 2-byte-hex-sequences that QByteArray::fromHex() is designed to read, given that is what the input seems to be comprised of.

              1 Reply Last reply
              2
              • B Offline
                B Offline
                Bonnie
                wrote on 21 Sept 2022, 08:52 last edited by
                #7

                I remember OP's previous post, in which the full string is "\xe6\x9d\xbe\xe6\x9c\xac....'s iPhone", so it is not a hex-like only string.
                I think OP need to find every "\x" and read the next two characters and convert them, and copy those which are not.
                Actually I believe Qt have already done that in its private code when reading ini by QSettings, but they are not exposed by public Apis.

                1 Reply Last reply
                0
                • B Offline
                  B Offline
                  bvieducasse
                  wrote on 21 Sept 2022, 14:49 last edited by
                  #8

                  Thank you for your quick responses. I think I got something to work, though maybe there are ways to write something more optimized.

                  
                     QByteArray resultString;
                     for (auto i = 0; i <= strFromExternalLibrary.size(); i++)
                     {
                         auto binome = strFromExternalLibrary.midRef(i,2);
                         auto checkStr = QString("\\x");
                         if(binome == checkStr)
                         {
                             auto bytes = strFromExternalLibrary.midRef(i+2, 2).toLatin1();
                             resultString += QByteArray::fromHex(bytes);
                             i+=3;
                         }
                         else
                         {
                             resultString.append(strFromExternalLibrary[i].toLatin1());
                         }
                     }
                     qDebug() << QString(resultString);
                     if(strModel == QString(resultString))
                         qDebug () << "It Works!!";
                  

                  This works even for strmodel = "松本foo"

                  1 Reply Last reply
                  1
                  • B Offline
                    B Offline
                    Bonnie
                    wrote on 22 Sept 2022, 09:42 last edited by
                    #9

                    As I said about QSettings, I've checked about it.
                    To use its "unescape" function we have to use the public APIs to read, but sadly QSettings need a real ini file, not something in memory.
                    So this is not a good solution, just keep some record about my test :)

                    QTemporaryFile file;
                    if(file.open()) {
                        file.write("string=@ByteArray(");
                        file.write(bytearray_of_escaped_string);
                        file.write(")\n");
                        file.close();
                    
                        QSettings settings(file.fileName(), QSettings::IniFormat);
                        qDebug() << settings.value("string").toString();
                    }
                    
                    1 Reply Last reply
                    0

                    2/9

                    21 Sept 2022, 08:15

                    7 unread
                    • Login

                    • Login or register to search.
                    2 out of 9
                    • First post
                      2/9
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • Users
                    • Groups
                    • Search
                    • Get Qt Extensions
                    • Unsolved