Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Detecting unicode encoding errors
Forum Updated to NodeBB v4.3 + New Features

Detecting unicode encoding errors

Scheduled Pinned Locked Moved Solved General and Desktop
8 Posts 3 Posters 2.3k Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • ybailly71Y Offline
    ybailly71Y Offline
    ybailly71
    wrote on last edited by
    #1

    Greetings all,

    Is it possible to detect errors while encoding a QString from char*?

    Here's the case: I get a string as a char* form an external library (on which I don't have any control).
    This string can use either UTF8 encoding, or "local8bit" encoding - thus may vary from a user to the other.

    What I basically need is to be table to write something like this:

    char const* src = ...;
    bool encoding_failed = false;
    QString qs = QString::fromUtf8(src, &encoding_failed);
    if ( encoding_failed )
    {
      qs = QString::fromLocal8Bit(src);
    }
    

    I looked at the docs for QString and QTextCodec, but I couldn't find any error support.

    Is there any (portable) way of achieving this?

    Thanks in advance for any hint.

    1 Reply Last reply
    0
    • mrjjM Offline
      mrjjM Offline
      mrjj
      Lifetime Qt Champion
      wrote on last edited by
      #2

      hi and welcome
      I didn't find any error reporting functions but I did stumble upon

      QTextCodec::ConverterState state;
      QTextCodec *codec = QTextCodec::codecForName("UTF-8");
      const QString text = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
      if (state.invalidChars > 0) {
      qDebug() << "Not a valid UTF-8 sequence.";
      }

      I hope it might be useful.

      ybailly71Y 1 Reply Last reply
      0
      • ybailly71Y Offline
        ybailly71Y Offline
        ybailly71
        wrote on last edited by
        #3
        This post is deleted!
        1 Reply Last reply
        0
        • mrjjM mrjj

          hi and welcome
          I didn't find any error reporting functions but I did stumble upon

          QTextCodec::ConverterState state;
          QTextCodec *codec = QTextCodec::codecForName("UTF-8");
          const QString text = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
          if (state.invalidChars > 0) {
          qDebug() << "Not a valid UTF-8 sequence.";
          }

          I hope it might be useful.

          ybailly71Y Offline
          ybailly71Y Offline
          ybailly71
          wrote on last edited by
          #4

          @mrjj Hello and thank you mrjj,

          I also stumbled on this QTextCodec::ConverterState class, but I did not go further as its contents is not documented: QTextCodec::ConverterState
          Therefore, is it safe to assume your (nice) code, which relies on the undocumented invalidChars data member, will still work in the future?
          I ended with something like this (slightly simplified):

          char const* src = ...;
          QString qs = QString::fromUtf8(src);
          QByteArray utf8 = qs.toUtf8();
          if ( utf8 != src )
            qs = QString::fromLocal8Bit();
          

          ...but that seems a bit overkill to me.

          mrjjM 1 Reply Last reply
          0
          • Paul ColbyP Offline
            Paul ColbyP Offline
            Paul Colby
            wrote on last edited by
            #5

            @ybailly71 said:

            I also stumbled on this QTextCodec::ConverterState class, but I did not go further as its contents is not documented: QTextCodec::ConverterState

            It appears to be documented under the QTextCodec::convertToUnicode function:

            If state is not 0, the codec should save the state after the conversion in state, and adjust the remainingChars and invalidChars members of the struct.

            The members in question are public, as indicated by http://doc.qt.io/qt-5/qtextcodec-converterstate-members.html but not listed in http://doc.qt.io/qt-5/qtextcodec-converterstate.html as those members don't have any documentation that the engine (Doxygen?) is recognising.

            I'd say this means its safe to use them, but the doc formatting could be improved.

            Cheers.

            1 Reply Last reply
            1
            • ybailly71Y ybailly71

              @mrjj Hello and thank you mrjj,

              I also stumbled on this QTextCodec::ConverterState class, but I did not go further as its contents is not documented: QTextCodec::ConverterState
              Therefore, is it safe to assume your (nice) code, which relies on the undocumented invalidChars data member, will still work in the future?
              I ended with something like this (slightly simplified):

              char const* src = ...;
              QString qs = QString::fromUtf8(src);
              QByteArray utf8 = qs.toUtf8();
              if ( utf8 != src )
                qs = QString::fromLocal8Bit();
              

              ...but that seems a bit overkill to me.

              mrjjM Offline
              mrjjM Offline
              mrjj
              Lifetime Qt Champion
              wrote on last edited by
              #6

              Hi
              Just as @Paul-Colby , I do think its safe to use.
              They are not flagged for removal and they are not in
              a private file/class as implementation details always are
              so should be ok.

              Your solution with
              if ( utf8 != src )
              is not bad but could be expensive with long strings. :)

              1 Reply Last reply
              0
              • ybailly71Y Offline
                ybailly71Y Offline
                ybailly71
                wrote on last edited by
                #7

                Ok thanks all, I'll go with that then.

                Have a nice day :-)

                mrjjM 1 Reply Last reply
                1
                • ybailly71Y ybailly71

                  Ok thanks all, I'll go with that then.

                  Have a nice day :-)

                  mrjjM Offline
                  mrjjM Offline
                  mrjj
                  Lifetime Qt Champion
                  wrote on last edited by
                  #8

                  @ybailly71
                  Nice day to you too :)

                  1 Reply Last reply
                  0

                  • Login

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • Users
                  • Groups
                  • Search
                  • Get Qt Extensions
                  • Unsolved