Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Detecting unicode encoding errors
Forum Updated to NodeBB v4.3 + New Features

Detecting unicode encoding errors

Scheduled Pinned Locked Moved Solved General and Desktop
8 Posts 3 Posters 2.3k Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • Y Offline
    Y Offline
    ybailly71
    wrote on 11 Apr 2016, 14:35 last edited by
    #1

    Greetings all,

    Is it possible to detect errors while encoding a QString from char*?

    Here's the case: I get a string as a char* form an external library (on which I don't have any control).
    This string can use either UTF8 encoding, or "local8bit" encoding - thus may vary from a user to the other.

    What I basically need is to be table to write something like this:

    char const* src = ...;
    bool encoding_failed = false;
    QString qs = QString::fromUtf8(src, &encoding_failed);
    if ( encoding_failed )
    {
      qs = QString::fromLocal8Bit(src);
    }
    

    I looked at the docs for QString and QTextCodec, but I couldn't find any error support.

    Is there any (portable) way of achieving this?

    Thanks in advance for any hint.

    1 Reply Last reply
    0
    • M Offline
      M Offline
      mrjj
      Lifetime Qt Champion
      wrote on 11 Apr 2016, 16:22 last edited by
      #2

      hi and welcome
      I didn't find any error reporting functions but I did stumble upon

      QTextCodec::ConverterState state;
      QTextCodec *codec = QTextCodec::codecForName("UTF-8");
      const QString text = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
      if (state.invalidChars > 0) {
      qDebug() << "Not a valid UTF-8 sequence.";
      }

      I hope it might be useful.

      Y 1 Reply Last reply 12 Apr 2016, 06:47
      0
      • Y Offline
        Y Offline
        ybailly71
        wrote on 12 Apr 2016, 06:40 last edited by
        #3
        This post is deleted!
        1 Reply Last reply
        0
        • M mrjj
          11 Apr 2016, 16:22

          hi and welcome
          I didn't find any error reporting functions but I did stumble upon

          QTextCodec::ConverterState state;
          QTextCodec *codec = QTextCodec::codecForName("UTF-8");
          const QString text = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
          if (state.invalidChars > 0) {
          qDebug() << "Not a valid UTF-8 sequence.";
          }

          I hope it might be useful.

          Y Offline
          Y Offline
          ybailly71
          wrote on 12 Apr 2016, 06:47 last edited by
          #4

          @mrjj Hello and thank you mrjj,

          I also stumbled on this QTextCodec::ConverterState class, but I did not go further as its contents is not documented: QTextCodec::ConverterState
          Therefore, is it safe to assume your (nice) code, which relies on the undocumented invalidChars data member, will still work in the future?
          I ended with something like this (slightly simplified):

          char const* src = ...;
          QString qs = QString::fromUtf8(src);
          QByteArray utf8 = qs.toUtf8();
          if ( utf8 != src )
            qs = QString::fromLocal8Bit();
          

          ...but that seems a bit overkill to me.

          M 1 Reply Last reply 12 Apr 2016, 07:42
          0
          • P Offline
            P Offline
            Paul Colby
            wrote on 12 Apr 2016, 06:59 last edited by
            #5

            @ybailly71 said:

            I also stumbled on this QTextCodec::ConverterState class, but I did not go further as its contents is not documented: QTextCodec::ConverterState

            It appears to be documented under the QTextCodec::convertToUnicode function:

            If state is not 0, the codec should save the state after the conversion in state, and adjust the remainingChars and invalidChars members of the struct.

            The members in question are public, as indicated by http://doc.qt.io/qt-5/qtextcodec-converterstate-members.html but not listed in http://doc.qt.io/qt-5/qtextcodec-converterstate.html as those members don't have any documentation that the engine (Doxygen?) is recognising.

            I'd say this means its safe to use them, but the doc formatting could be improved.

            Cheers.

            1 Reply Last reply
            1
            • Y ybailly71
              12 Apr 2016, 06:47

              @mrjj Hello and thank you mrjj,

              I also stumbled on this QTextCodec::ConverterState class, but I did not go further as its contents is not documented: QTextCodec::ConverterState
              Therefore, is it safe to assume your (nice) code, which relies on the undocumented invalidChars data member, will still work in the future?
              I ended with something like this (slightly simplified):

              char const* src = ...;
              QString qs = QString::fromUtf8(src);
              QByteArray utf8 = qs.toUtf8();
              if ( utf8 != src )
                qs = QString::fromLocal8Bit();
              

              ...but that seems a bit overkill to me.

              M Offline
              M Offline
              mrjj
              Lifetime Qt Champion
              wrote on 12 Apr 2016, 07:42 last edited by
              #6

              Hi
              Just as @Paul-Colby , I do think its safe to use.
              They are not flagged for removal and they are not in
              a private file/class as implementation details always are
              so should be ok.

              Your solution with
              if ( utf8 != src )
              is not bad but could be expensive with long strings. :)

              1 Reply Last reply
              0
              • Y Offline
                Y Offline
                ybailly71
                wrote on 12 Apr 2016, 09:11 last edited by
                #7

                Ok thanks all, I'll go with that then.

                Have a nice day :-)

                M 1 Reply Last reply 12 Apr 2016, 09:20
                1
                • Y ybailly71
                  12 Apr 2016, 09:11

                  Ok thanks all, I'll go with that then.

                  Have a nice day :-)

                  M Offline
                  M Offline
                  mrjj
                  Lifetime Qt Champion
                  wrote on 12 Apr 2016, 09:20 last edited by
                  #8

                  @ybailly71
                  Nice day to you too :)

                  1 Reply Last reply
                  0

                  1/8

                  11 Apr 2016, 14:35

                  • Login

                  • Login or register to search.
                  1 out of 8
                  • First post
                    1/8
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • Users
                  • Groups
                  • Search
                  • Get Qt Extensions
                  • Unsolved