Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Is QString::fromUtf8(const QByteArray &str) safe for all kind of data?

Is QString::fromUtf8(const QByteArray &str) safe for all kind of data?

Scheduled Pinned Locked Moved Unsolved General and Desktop
16 Posts 5 Posters 1.8k Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • l3u_L Offline
    l3u_L Offline
    l3u_
    wrote on last edited by
    #1

    Hi :-)

    I recently implemented a QUdpSocket listening for broadcasted datagrams on a server. A client could broadcast a datagram like "some text|some other text" and the server processes it like so:

    void ServerPage::checkDiscoverBroadcast()
    {
        QByteArray datagram;
    
        while (m_discoverSocket->hasPendingDatagrams()) {
            datagram.resize(int(m_discoverSocket->pendingDatagramSize()));
            m_discoverSocket->readDatagram(datagram.data(), datagram.size());
    
            const QStringList parts = QString::fromUtf8(datagram).split(QLatin1Char('|'));
            if (parts.count() == 2 && parts.at(0) == QStringLiteral("some text")) {
                ...
            }
        }
    }
    

    Now I wonder if it's safe to do so, as – theoretically – some arbitrary data could as well be broadcasted from some other source via UDP using the same port. So can anything bad happen if some random binary data (no UTF-8 data) is processed by the above code?

    Thanks for all info!

    JonBJ JKSHJ 2 Replies Last reply
    0
    • l3u_L l3u_

      Hi :-)

      I recently implemented a QUdpSocket listening for broadcasted datagrams on a server. A client could broadcast a datagram like "some text|some other text" and the server processes it like so:

      void ServerPage::checkDiscoverBroadcast()
      {
          QByteArray datagram;
      
          while (m_discoverSocket->hasPendingDatagrams()) {
              datagram.resize(int(m_discoverSocket->pendingDatagramSize()));
              m_discoverSocket->readDatagram(datagram.data(), datagram.size());
      
              const QStringList parts = QString::fromUtf8(datagram).split(QLatin1Char('|'));
              if (parts.count() == 2 && parts.at(0) == QStringLiteral("some text")) {
                  ...
              }
          }
      }
      

      Now I wonder if it's safe to do so, as – theoretically – some arbitrary data could as well be broadcasted from some other source via UDP using the same port. So can anything bad happen if some random binary data (no UTF-8 data) is processed by the above code?

      Thanks for all info!

      JonBJ Offline
      JonBJ Offline
      JonB
      wrote on last edited by
      #2

      @l3u_
      Well it shouldn't crash or overflow, and since https://doc.qt.io/qt-5/qstring.html#fromUtf8 says:

      However, invalid sequences are possible with UTF-8 and, if any such are found, they will be replaced with one or more "replacement characters", or suppressed. These include non-Unicode sequences, non-characters, overlong sequences or surrogate codepoints encoded into UTF-8

      isn't that the worst that can happen?

      1 Reply Last reply
      3
      • l3u_L l3u_

        Hi :-)

        I recently implemented a QUdpSocket listening for broadcasted datagrams on a server. A client could broadcast a datagram like "some text|some other text" and the server processes it like so:

        void ServerPage::checkDiscoverBroadcast()
        {
            QByteArray datagram;
        
            while (m_discoverSocket->hasPendingDatagrams()) {
                datagram.resize(int(m_discoverSocket->pendingDatagramSize()));
                m_discoverSocket->readDatagram(datagram.data(), datagram.size());
        
                const QStringList parts = QString::fromUtf8(datagram).split(QLatin1Char('|'));
                if (parts.count() == 2 && parts.at(0) == QStringLiteral("some text")) {
                    ...
                }
            }
        }
        

        Now I wonder if it's safe to do so, as – theoretically – some arbitrary data could as well be broadcasted from some other source via UDP using the same port. So can anything bad happen if some random binary data (no UTF-8 data) is processed by the above code?

        Thanks for all info!

        JKSHJ Offline
        JKSHJ Offline
        JKSH
        Moderators
        wrote on last edited by
        #3

        @l3u_ said in Is QString::fromUtf8(const QByteArray &str) safe for all kind of data?:

        I wonder if it's safe to do

        Is it safe? Yes it is.

        Is it valid? That's debatable.

        Note: You don't need to convert the datagram into a QString to split it. You can use QByteArray::split().

        Qt Doc Search for browsers: forum.qt.io/topic/35616/web-browser-extension-for-improved-doc-searches

        1 Reply Last reply
        2
        • l3u_L Offline
          l3u_L Offline
          l3u_
          wrote on last edited by
          #4

          Thanks for the info! I also read the docs about replaced and/or suppressed characters, I just wondered if I got it right.

          @JKSH I need to process the parts of the datagram as QStrings anyway later, so I would have to convert each item in the QList<QByteArray> of the split up QByteArray to a QString anyway. But thanks for your hint!

          aha_1980A 1 Reply Last reply
          0
          • l3u_L l3u_

            Thanks for the info! I also read the docs about replaced and/or suppressed characters, I just wondered if I got it right.

            @JKSH I need to process the parts of the datagram as QStrings anyway later, so I would have to convert each item in the QList<QByteArray> of the split up QByteArray to a QString anyway. But thanks for your hint!

            aha_1980A Offline
            aha_1980A Offline
            aha_1980
            Lifetime Qt Champion
            wrote on last edited by
            #5

            Hi @l3u_,

            you should consider using QTextStream. This will make sure that you don't get garbled output if your UTF-8 string is distributed over several datagrams.

            If you can assure your string is always in one datagram, your code should already work.

            Regards

            Qt has to stay free or it will die.

            l3u_L 2 Replies Last reply
            5
            • aha_1980A aha_1980

              Hi @l3u_,

              you should consider using QTextStream. This will make sure that you don't get garbled output if your UTF-8 string is distributed over several datagrams.

              If you can assure your string is always in one datagram, your code should already work.

              Regards

              l3u_L Offline
              l3u_L Offline
              l3u_
              wrote on last edited by
              #6

              @aha_1980 Thanks for pointing this out!

              1 Reply Last reply
              0
              • aha_1980A aha_1980

                Hi @l3u_,

                you should consider using QTextStream. This will make sure that you don't get garbled output if your UTF-8 string is distributed over several datagrams.

                If you can assure your string is always in one datagram, your code should already work.

                Regards

                l3u_L Offline
                l3u_L Offline
                l3u_
                wrote on last edited by
                #7

                @aha_1980 Playing around with it, how would you assemble data fragmented in multiple datagrams? For a TCP server-client-implementation, I do this via a transaction like so:

                ...
                connect(m_socket, &QTcpSocket::readyRead, this, &AbstractJsonInterface::readData);
                ....
                
                void AbstractJsonInterface::readData()
                {
                    QByteArray data;
                    QDataStream stream(m_socket);
                    stream.setVersion(STREAM_VERSION);
                
                    for (;;) {
                        stream.startTransaction();
                        stream >> data;
                        if (! stream.commitTransaction()) {
                            break;
                        }
                
                        // Do something with the data
                        ...
                    }
                }
                

                but QDataStream stream(m_socket); won't work for a QUdpSocket, and I have to read out the data like e. g. this:

                void AbstractDiscoverEngine::readDatagram()
                {
                    QByteArray datagram;
                
                    while (m_socket->state() == QAbstractSocket::BoundState && m_socket->hasPendingDatagrams()) {
                        datagram.resize(int(m_socket->pendingDatagramSize()));
                        m_socket->readDatagram(datagram.data(), datagram.size());
                        
                        // Do something with the data
                    }
                }
                

                But apart from that: If I can be sure that teh data fits in one datagram (it will, I pass less than 100 bytes), there's no benefit of using a QDataStream or QTextStream instead of using the "raw" UTF-8 data, is it?

                jsulmJ 1 Reply Last reply
                0
                • l3u_L l3u_

                  @aha_1980 Playing around with it, how would you assemble data fragmented in multiple datagrams? For a TCP server-client-implementation, I do this via a transaction like so:

                  ...
                  connect(m_socket, &QTcpSocket::readyRead, this, &AbstractJsonInterface::readData);
                  ....
                  
                  void AbstractJsonInterface::readData()
                  {
                      QByteArray data;
                      QDataStream stream(m_socket);
                      stream.setVersion(STREAM_VERSION);
                  
                      for (;;) {
                          stream.startTransaction();
                          stream >> data;
                          if (! stream.commitTransaction()) {
                              break;
                          }
                  
                          // Do something with the data
                          ...
                      }
                  }
                  

                  but QDataStream stream(m_socket); won't work for a QUdpSocket, and I have to read out the data like e. g. this:

                  void AbstractDiscoverEngine::readDatagram()
                  {
                      QByteArray datagram;
                  
                      while (m_socket->state() == QAbstractSocket::BoundState && m_socket->hasPendingDatagrams()) {
                          datagram.resize(int(m_socket->pendingDatagramSize()));
                          m_socket->readDatagram(datagram.data(), datagram.size());
                          
                          // Do something with the data
                      }
                  }
                  

                  But apart from that: If I can be sure that teh data fits in one datagram (it will, I pass less than 100 bytes), there's no benefit of using a QDataStream or QTextStream instead of using the "raw" UTF-8 data, is it?

                  jsulmJ Offline
                  jsulmJ Offline
                  jsulm
                  Lifetime Qt Champion
                  wrote on last edited by
                  #8

                  @l3u_ You can simply have a QByteArray buffer and append the data from each datagram (https://doc.qt.io/qt-5/qnetworkdatagram.html#data) to it. No need to resize anything, QByteArray will do this for you.

                  https://forum.qt.io/topic/113070/qt-code-of-conduct

                  l3u_L 1 Reply Last reply
                  2
                  • jsulmJ jsulm

                    @l3u_ You can simply have a QByteArray buffer and append the data from each datagram (https://doc.qt.io/qt-5/qnetworkdatagram.html#data) to it. No need to resize anything, QByteArray will do this for you.

                    l3u_L Offline
                    l3u_L Offline
                    l3u_
                    wrote on last edited by
                    #9

                    @jsulm But if the data always fits in one datagram? Is there a need for using a buffer or a QDataStream then?

                    jsulmJ 1 Reply Last reply
                    0
                    • l3u_L l3u_

                      @jsulm But if the data always fits in one datagram? Is there a need for using a buffer or a QDataStream then?

                      jsulmJ Offline
                      jsulmJ Offline
                      jsulm
                      Lifetime Qt Champion
                      wrote on last edited by
                      #10

                      @l3u_ In this case not, but can you be sure it fits into one datagram?

                      https://forum.qt.io/topic/113070/qt-code-of-conduct

                      l3u_L 1 Reply Last reply
                      0
                      • jsulmJ jsulm

                        @l3u_ In this case not, but can you be sure it fits into one datagram?

                        l3u_L Offline
                        l3u_L Offline
                        l3u_
                        wrote on last edited by
                        #11

                        @jsulm I think if data won't exceed the maximum size of an UDP datagram (65,507 bytes), it will always be delivered in one, won't it? I pass around less than 100 characters

                        jsulmJ aha_1980A 2 Replies Last reply
                        0
                        • l3u_L l3u_

                          @jsulm I think if data won't exceed the maximum size of an UDP datagram (65,507 bytes), it will always be delivered in one, won't it? I pass around less than 100 characters

                          jsulmJ Offline
                          jsulmJ Offline
                          jsulm
                          Lifetime Qt Champion
                          wrote on last edited by
                          #12

                          @l3u_ Should work I think

                          https://forum.qt.io/topic/113070/qt-code-of-conduct

                          1 Reply Last reply
                          0
                          • l3u_L l3u_

                            @jsulm I think if data won't exceed the maximum size of an UDP datagram (65,507 bytes), it will always be delivered in one, won't it? I pass around less than 100 characters

                            aha_1980A Offline
                            aha_1980A Offline
                            aha_1980
                            Lifetime Qt Champion
                            wrote on last edited by
                            #13

                            @l3u_ In principle you are right, but for the max. size please read https://serverfault.com/questions/246508/how-is-the-mtu-is-65535-in-udp-but-ethernet-does-not-allow-frame-size-more-than

                            Qt has to stay free or it will die.

                            l3u_L 1 Reply Last reply
                            1
                            • aha_1980A aha_1980

                              @l3u_ In principle you are right, but for the max. size please read https://serverfault.com/questions/246508/how-is-the-mtu-is-65535-in-udp-but-ethernet-does-not-allow-frame-size-more-than

                              l3u_L Offline
                              l3u_L Offline
                              l3u_
                              wrote on last edited by
                              #14

                              @aha_1980 Okay, that is interesting. But still, 1500 bytes is more than enough for my purpose.

                              But if there was the mentioned fragmentation, would I have to care about it in my implementation, or would QUdpSocket deliver the complete de-fragmented datagram when I read one out via QUdpSocket::receiveDatagram()?

                              aha_1980A 1 Reply Last reply
                              0
                              • l3u_L Offline
                                l3u_L Offline
                                l3u_
                                wrote on last edited by
                                #15

                                Well, okay: Here's what the docs say concerning QUdpSocket::writeDatagram():

                                Datagrams are always written as one block. The maximum size of a datagram is highly platform-dependent, but can be as low as 8192 bytes. If the datagram is too large, this function will return -1 and error() will return DatagramTooLargeError.

                                Sending datagrams larger than 512 bytes is in general disadvised, as even if they are sent successfully, they are likely to be fragmented by the IP layer before arriving at their final destination.

                                So we simply shoudln't send large datagrams at all ;-) Apparently, below 512 bytes, there's no fragementation, everything arrives in one block and there's no problem.

                                1 Reply Last reply
                                1
                                • l3u_L l3u_

                                  @aha_1980 Okay, that is interesting. But still, 1500 bytes is more than enough for my purpose.

                                  But if there was the mentioned fragmentation, would I have to care about it in my implementation, or would QUdpSocket deliver the complete de-fragmented datagram when I read one out via QUdpSocket::receiveDatagram()?

                                  aha_1980A Offline
                                  aha_1980A Offline
                                  aha_1980
                                  Lifetime Qt Champion
                                  wrote on last edited by
                                  #16

                                  @l3u_

                                  But if there was the mentioned fragmentation, would I have to care about it in my implementation, or would QUdpSocket deliver the complete de-fragmented datagram when I read one out via QUdpSocket::receiveDatagram()?

                                  I'd indeed expect to receive the complete datagram, as the fragmenting is transparent.

                                  But the chance to loose a complete datagram is higher when it's fragmented, as all parts need to arrive for de-fragmenting.

                                  Qt has to stay free or it will die.

                                  1 Reply Last reply
                                  0

                                  • Login

                                  • Login or register to search.
                                  • First post
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • Users
                                  • Groups
                                  • Search
                                  • Get Qt Extensions
                                  • Unsolved