Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to determine data format in QByteArray (ASCII / HEX / Unicode)
QtWS25 Last Chance

How to determine data format in QByteArray (ASCII / HEX / Unicode)

Scheduled Pinned Locked Moved Unsolved General and Desktop
8 Posts 5 Posters 605 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    hkottmann
    wrote on 26 Sept 2022, 16:05 last edited by
    #1

    I have a software that analyzes the serial communication between a testing machine and their software because I need to grab these values for my software. Most of the components are using ASCII-formats for their communication, but there are some that are using binary data like Modbus, etc. As I use readAll(), I've get the result into a QByteArray and when I print it by qDebug() I can clearly see whether it's ASCII or HEX (HEX-values are printed with \xdf\x01\xff...), but I did not find a way to determine by software what format it is. I think there must be a way to find out this...

    J 1 Reply Last reply 26 Sept 2022, 16:17
    0
    • H hkottmann
      26 Sept 2022, 16:05

      I have a software that analyzes the serial communication between a testing machine and their software because I need to grab these values for my software. Most of the components are using ASCII-formats for their communication, but there are some that are using binary data like Modbus, etc. As I use readAll(), I've get the result into a QByteArray and when I print it by qDebug() I can clearly see whether it's ASCII or HEX (HEX-values are printed with \xdf\x01\xff...), but I did not find a way to determine by software what format it is. I think there must be a way to find out this...

      J Offline
      J Offline
      JonB
      wrote on 26 Sept 2022, 16:17 last edited by JonB
      #2

      @hkottmann said in How to determine data format in QByteArray (ASCII / HEX / Unicode):

      but I did not find a way to determine by software what format it is. I think there must be a way to find out this...

      No, there can be no such thing. You receive bytes over serial into QByeArray. Bytes are bytes! They could mean anything, they could be arbitrary binary values or multibyte numbers or ASCII character values or whatever. There is no fool proof way of knowing which, other than seeing if there look like a lot of characters there. You have to know who the sender is and what "format" it is sending bytes in if you want to "interpret" them as such.

      1 Reply Last reply
      2
      • H Offline
        H Offline
        hkottmann
        wrote on 26 Sept 2022, 16:23 last edited by
        #3

        BTW, but how knows qDebug() how to print it?

        J M 2 Replies Last reply 26 Sept 2022, 16:50
        0
        • H hkottmann
          26 Sept 2022, 16:23

          BTW, but how knows qDebug() how to print it?

          J Offline
          J Offline
          JonB
          wrote on 26 Sept 2022, 16:50 last edited by
          #4

          @hkottmann
          You are handing qDebug() a QByteArray, so it shows the bytes in the array. It may or may not show them as characters if the byte happens to be in ASCII range, I don't know. But whatever neither qDebug() not QByteArray know anything about what the bytes "mean" or where they come from.

          1 Reply Last reply
          2
          • C Offline
            C Offline
            Christian Ehrlicher
            Lifetime Qt Champion
            wrote on 26 Sept 2022, 17:30 last edited by
            #5

            Looks like the OP wanted a second opinion because mine wasn't the right answer: https://stackoverflow.com/questions/73855456/how-to-determine-data-format-in-qbytearray-ascii-hex-unicode :)

            Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
            Visit the Qt Academy at https://academy.qt.io/catalog

            1 Reply Last reply
            2
            • H hkottmann
              26 Sept 2022, 16:23

              BTW, but how knows qDebug() how to print it?

              M Offline
              M Offline
              mchinand
              wrote on 26 Sept 2022, 17:46 last edited by
              #6

              @hkottmann said in How to determine data format in QByteArray (ASCII / HEX / Unicode):

              BTW, but how knows qDebug() how to print it?

              The source is available. You can see how qDebug() does it. https://code.qt.io/cgit/qt/qtbase.git/tree/src/corelib/io/qdebug.cpp#n26

              1 Reply Last reply
              2
              • S Offline
                S Offline
                SimonSchroeder
                wrote on 27 Sept 2022, 07:57 last edited by
                #7

                Well, there are a few tricks you can try, though it will not be perfect.

                First, I assume that when you say ASCII you mean true ASCII, i.e. only 128 bits and not the full 256. If this is not the case then you are (almost) out of luck. At least you could not distinguish between ASCII, HEX, and Unicode all three at the same time.

                If you have 128-bit ASCII then just treat it as UTF-8 (I assume that when you say Unicode, you mean UTF-8). This range is the same for ASCII and UTF-8. Then, you only need to distinguish between UTF-8 and HEX.

                The first 32 values in ASCII and Unicode are control characters. Most likely you'll only want to support a specific set of control characters, like \0, \n and \r (maybe \t), inside text blocks. If your QByteArray contains any other control characters treat the whole QByteArray as HEX.

                You should also have a quick look at UTF-8 on Wikipedia. If your byte starts with 0xxxxxxx it is an ASCII character (including all control characters). 0x110xxxxx is a 2-byte multibyte-character in UTF-8 (two leading ones), 0x1110xxxx is a 3-byte multibyte-character, and 0x11110xxx is a 4-byte multibyte-character. The remaining bytes of the multibyte-character start with 0x10xxxxxx. If you don't recognize a QByteArray as UTF-8 (because of invalid multibyte sequences that are not UTF-8 multibytes) treat it as HEX.

                The last one can be sped up a little bit (i.e. you don't have to implement it yourself):

                    QTextCodec::ConverterState state;
                    QTextCodec *codec = QTextCodec::codecForName("UTF-8");
                    QByteArray byteArray(text);
                    QString str = codec->toUnicode(byteArray.constData(), byteArray.size(), &state);
                    if (state.invalidChars > 0)
                    {
                        // not UTF-8 -> treat as HEX
                    }
                

                This approach still has some room for errors. Any HEX sequence could look like UTF-8 by accident. How well it works depends on the length of the QByteArray. If you have longer sequences this approach will work better. If you receive the full protocol and somewhere in there is some transmitted text hidden, then you need to parse the protocol and can't just rely on my proposed heuristic.

                1 Reply Last reply
                0
                • H Offline
                  H Offline
                  hkottmann
                  wrote on 8 Oct 2022, 07:35 last edited by
                  #8

                  Dear all

                  Thanks for your help. I used the way that qDebug goes, BTW, there is no waterproof way to easily determine the data format. The function isprint() can give you a hint, but you need in the most of cases further checks to determine the real data format, as QByteArray or QString don't have any header information about it's data format.

                  1 Reply Last reply
                  0

                  • Login

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • Users
                  • Groups
                  • Search
                  • Get Qt Extensions
                  • Unsolved