Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to check if QByteArray is valid UTF8?

How to check if QByteArray is valid UTF8?

Scheduled Pinned Locked Moved Unsolved General and Desktop
9 Posts 4 Posters 2.8k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D Offline
    D Offline
    Donald Duck
    wrote on last edited by
    #1

    How can I check if a QByteArray is valid UTF8? I tried QString::fromUTF8(byteArray) but according to the documentation, this will just replace or suppress any invalid sequences silently without telling me what happened.

    1 Reply Last reply
    0
    • VRoninV Offline
      VRoninV Offline
      VRonin
      wrote on last edited by
      #2

      Try

      QTextStream stream(byteArray)
      stream.setCodec("UTF-8");
      stream.setAutoDetectUnicode(false);
      stream.readAll();
      if(stream.status()==QTextStream::Ok) qDebug("Array is valid UTF-8");
      else qDebug("Array is not valid UTF-8");
      

      "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
      ~Napoleon Bonaparte

      On a crusade to banish setIndexWidget() from the holy land of Qt

      D 1 Reply Last reply
      1
      • VRoninV VRonin

        Try

        QTextStream stream(byteArray)
        stream.setCodec("UTF-8");
        stream.setAutoDetectUnicode(false);
        stream.readAll();
        if(stream.status()==QTextStream::Ok) qDebug("Array is valid UTF-8");
        else qDebug("Array is not valid UTF-8");
        
        D Offline
        D Offline
        Donald Duck
        wrote on last edited by
        #3

        @VRonin That didn't work, it always said that it is valid UTF-8 even when it's not.

        1 Reply Last reply
        1
        • VRoninV Offline
          VRoninV Offline
          VRonin
          wrote on last edited by
          #4

          Found the solution here: https://forum.qt.io/topic/55325/solved-how-to-know-if-qtextstream-could-not-encode-data-it-reads

          #include <QDebug>
          #include <QTextCodec>
          int main(int argc, char **argv) {
              QByteArray byteArray("test\xc3\xb1test");
              QTextCodec::ConverterState state;
              QTextCodec *codec = QTextCodec::codecForName("UTF-8");
          /////////////////////////////////////////////////////////////////////////////////////////////////////
              const QString validText = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
              if (state.invalidChars == 0)
                  qDebug("Array is valid UTF-8");
              else
                  qDebug("Array is not valid UTF-8");
          /////////////////////////////////////////////////////////////////////////////////////////////////////
              byteArray = QByteArray("test\xc3\x28test");
              const QString invalidText = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
              if (state.invalidChars == 0)
                  qDebug("Array is valid UTF-8");
              else
                  qDebug("Array is not valid UTF-8");
          /////////////////////////////////////////////////////////////////////////////////////////////////////
              return 0;
          }
          

          "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
          ~Napoleon Bonaparte

          On a crusade to banish setIndexWidget() from the holy land of Qt

          V 1 Reply Last reply
          5
          • VRoninV VRonin

            Found the solution here: https://forum.qt.io/topic/55325/solved-how-to-know-if-qtextstream-could-not-encode-data-it-reads

            #include <QDebug>
            #include <QTextCodec>
            int main(int argc, char **argv) {
                QByteArray byteArray("test\xc3\xb1test");
                QTextCodec::ConverterState state;
                QTextCodec *codec = QTextCodec::codecForName("UTF-8");
            /////////////////////////////////////////////////////////////////////////////////////////////////////
                const QString validText = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
                if (state.invalidChars == 0)
                    qDebug("Array is valid UTF-8");
                else
                    qDebug("Array is not valid UTF-8");
            /////////////////////////////////////////////////////////////////////////////////////////////////////
                byteArray = QByteArray("test\xc3\x28test");
                const QString invalidText = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
                if (state.invalidChars == 0)
                    qDebug("Array is valid UTF-8");
                else
                    qDebug("Array is not valid UTF-8");
            /////////////////////////////////////////////////////////////////////////////////////////////////////
                return 0;
            }
            
            V Offline
            V Offline
            Violet Giraffe
            wrote on last edited by
            #5

            @VRonin, ConverterState doesn't seem to be documented. Looks more like implementation detail than intended way to monitor conversion results, else why wouldn't every of the three overloads take this optional out parameter?

            VRoninV 1 Reply Last reply
            0
            • V Violet Giraffe

              @VRonin, ConverterState doesn't seem to be documented. Looks more like implementation detail than intended way to monitor conversion results, else why wouldn't every of the three overloads take this optional out parameter?

              VRoninV Offline
              VRoninV Offline
              VRonin
              wrote on last edited by
              #6

              @Violet-Giraffe said in How to check if QByteArray is valid UTF8?:

              ConverterState doesn't seem to be documented

              Agree, it should be

              Looks more like implementation detail than intended way to monitor conversion results

              It's explicitly exported, not hidden in the private implementation so I don't think this is true

              why wouldn't every of the three overloads take this optional out parameter?

              The other overloads are just for convenience, internally they all call the 3 arguments method

              "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
              ~Napoleon Bonaparte

              On a crusade to banish setIndexWidget() from the holy land of Qt

              V 1 Reply Last reply
              4
              • VRoninV VRonin

                @Violet-Giraffe said in How to check if QByteArray is valid UTF8?:

                ConverterState doesn't seem to be documented

                Agree, it should be

                Looks more like implementation detail than intended way to monitor conversion results

                It's explicitly exported, not hidden in the private implementation so I don't think this is true

                why wouldn't every of the three overloads take this optional out parameter?

                The other overloads are just for convenience, internally they all call the 3 arguments method

                V Offline
                V Offline
                Violet Giraffe
                wrote on last edited by
                #7

                @VRonin, fair enoug. Don't get me wrong, your answer is spot on and it is, in fact, the only proper way of checking UTF-8 for correctness I've found, but I worry it may break in the future.

                VRoninV 1 Reply Last reply
                0
                • V Violet Giraffe

                  @VRonin, fair enoug. Don't get me wrong, your answer is spot on and it is, in fact, the only proper way of checking UTF-8 for correctness I've found, but I worry it may break in the future.

                  VRoninV Offline
                  VRoninV Offline
                  VRonin
                  wrote on last edited by
                  #8

                  @Violet-Giraffe said in How to check if QByteArray is valid UTF8?:

                  but I worry it may break in the future

                  It can only be broken in major releases Qt6, Qt7, etc. so very infrequently

                  "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                  ~Napoleon Bonaparte

                  On a crusade to banish setIndexWidget() from the holy land of Qt

                  1 Reply Last reply
                  1
                  • S Offline
                    S Offline
                    SlySven
                    wrote on last edited by
                    #9

                    Also, if you are creating your own codec (because Qt uses ICU for one and the OS you are on does not use ICU so you are left on your own) you are supposed to adjust QTextCodec::ConverterState::invalidChars (and QTextCodec::ConverterState::remainingChars if appropriate) AND consider the QTextCodec::ConverterState::flags enum - particularly the ConvertInvalidToNull {if set each invalid char in input should be output as a null} and IgnoreHeader {if set any BOM characters at the beginning of Unicode input should be skipped and none generated in output}, in your sub-class...

                    As it happens it looks like Qt tries a character or string and looks for a zero invalid char count to provide the mechanism to work the QTextCodec::canEncode(...) methods for your sub-class.

                    1 Reply Last reply
                    0

                    • Login

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • Users
                    • Groups
                    • Search
                    • Get Qt Extensions
                    • Unsolved