Is QString::fromUtf8(const QByteArray &str) safe for all kind of data?



  • Hi :-)

    I recently implemented a QUdpSocket listening for broadcasted datagrams on a server. A client could broadcast a datagram like "some text|some other text" and the server processes it like so:

    void ServerPage::checkDiscoverBroadcast()
    {
        QByteArray datagram;
    
        while (m_discoverSocket->hasPendingDatagrams()) {
            datagram.resize(int(m_discoverSocket->pendingDatagramSize()));
            m_discoverSocket->readDatagram(datagram.data(), datagram.size());
    
            const QStringList parts = QString::fromUtf8(datagram).split(QLatin1Char('|'));
            if (parts.count() == 2 && parts.at(0) == QStringLiteral("some text")) {
                ...
            }
        }
    }
    

    Now I wonder if it's safe to do so, as – theoretically – some arbitrary data could as well be broadcasted from some other source via UDP using the same port. So can anything bad happen if some random binary data (no UTF-8 data) is processed by the above code?

    Thanks for all info!



  • @l3u_
    Well it shouldn't crash or overflow, and since https://doc.qt.io/qt-5/qstring.html#fromUtf8 says:

    However, invalid sequences are possible with UTF-8 and, if any such are found, they will be replaced with one or more "replacement characters", or suppressed. These include non-Unicode sequences, non-characters, overlong sequences or surrogate codepoints encoded into UTF-8

    isn't that the worst that can happen?


  • Moderators

    @l3u_ said in Is QString::fromUtf8(const QByteArray &str) safe for all kind of data?:

    I wonder if it's safe to do

    Is it safe? Yes it is.

    Is it valid? That's debatable.

    Note: You don't need to convert the datagram into a QString to split it. You can use QByteArray::split().



  • Thanks for the info! I also read the docs about replaced and/or suppressed characters, I just wondered if I got it right.

    @JKSH I need to process the parts of the datagram as QStrings anyway later, so I would have to convert each item in the QList<QByteArray> of the split up QByteArray to a QString anyway. But thanks for your hint!


  • Qt Champions 2018

    Hi @l3u_,

    you should consider using QTextStream. This will make sure that you don't get garbled output if your UTF-8 string is distributed over several datagrams.

    If you can assure your string is always in one datagram, your code should already work.

    Regards



  • @aha_1980 Thanks for pointing this out!



  • @aha_1980 Playing around with it, how would you assemble data fragmented in multiple datagrams? For a TCP server-client-implementation, I do this via a transaction like so:

    ...
    connect(m_socket, &QTcpSocket::readyRead, this, &AbstractJsonInterface::readData);
    ....
    
    void AbstractJsonInterface::readData()
    {
        QByteArray data;
        QDataStream stream(m_socket);
        stream.setVersion(STREAM_VERSION);
    
        for (;;) {
            stream.startTransaction();
            stream >> data;
            if (! stream.commitTransaction()) {
                break;
            }
    
            // Do something with the data
            ...
        }
    }
    

    but QDataStream stream(m_socket); won't work for a QUdpSocket, and I have to read out the data like e. g. this:

    void AbstractDiscoverEngine::readDatagram()
    {
        QByteArray datagram;
    
        while (m_socket->state() == QAbstractSocket::BoundState && m_socket->hasPendingDatagrams()) {
            datagram.resize(int(m_socket->pendingDatagramSize()));
            m_socket->readDatagram(datagram.data(), datagram.size());
            
            // Do something with the data
        }
    }
    

    But apart from that: If I can be sure that teh data fits in one datagram (it will, I pass less than 100 bytes), there's no benefit of using a QDataStream or QTextStream instead of using the "raw" UTF-8 data, is it?


  • Qt Champions 2018

    @l3u_ You can simply have a QByteArray buffer and append the data from each datagram (https://doc.qt.io/qt-5/qnetworkdatagram.html#data) to it. No need to resize anything, QByteArray will do this for you.



  • @jsulm But if the data always fits in one datagram? Is there a need for using a buffer or a QDataStream then?


  • Qt Champions 2018

    @l3u_ In this case not, but can you be sure it fits into one datagram?



  • @jsulm I think if data won't exceed the maximum size of an UDP datagram (65,507 bytes), it will always be delivered in one, won't it? I pass around less than 100 characters


  • Qt Champions 2018

    @l3u_ Should work I think


  • Qt Champions 2018



  • @aha_1980 Okay, that is interesting. But still, 1500 bytes is more than enough for my purpose.

    But if there was the mentioned fragmentation, would I have to care about it in my implementation, or would QUdpSocket deliver the complete de-fragmented datagram when I read one out via QUdpSocket::receiveDatagram()?



  • Well, okay: Here's what the docs say concerning QUdpSocket::writeDatagram():

    Datagrams are always written as one block. The maximum size of a datagram is highly platform-dependent, but can be as low as 8192 bytes. If the datagram is too large, this function will return -1 and error() will return DatagramTooLargeError.

    Sending datagrams larger than 512 bytes is in general disadvised, as even if they are sent successfully, they are likely to be fragmented by the IP layer before arriving at their final destination.

    So we simply shoudln't send large datagrams at all ;-) Apparently, below 512 bytes, there's no fragementation, everything arrives in one block and there's no problem.


  • Qt Champions 2018

    @l3u_

    But if there was the mentioned fragmentation, would I have to care about it in my implementation, or would QUdpSocket deliver the complete de-fragmented datagram when I read one out via QUdpSocket::receiveDatagram()?

    I'd indeed expect to receive the complete datagram, as the fragmenting is transparent.

    But the chance to loose a complete datagram is higher when it's fragmented, as all parts need to arrive for de-fragmenting.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.