Is QString::fromUtf8(const QByteArray &str) safe for all kind of data?
-
Hi :-)
I recently implemented a
QUdpSocket
listening for broadcasted datagrams on a server. A client could broadcast a datagram like "some text|some other text" and the server processes it like so:void ServerPage::checkDiscoverBroadcast() { QByteArray datagram; while (m_discoverSocket->hasPendingDatagrams()) { datagram.resize(int(m_discoverSocket->pendingDatagramSize())); m_discoverSocket->readDatagram(datagram.data(), datagram.size()); const QStringList parts = QString::fromUtf8(datagram).split(QLatin1Char('|')); if (parts.count() == 2 && parts.at(0) == QStringLiteral("some text")) { ... } } }
Now I wonder if it's safe to do so, as – theoretically – some arbitrary data could as well be broadcasted from some other source via UDP using the same port. So can anything bad happen if some random binary data (no UTF-8 data) is processed by the above code?
Thanks for all info!
-
@l3u_
Well it shouldn't crash or overflow, and since https://doc.qt.io/qt-5/qstring.html#fromUtf8 says:However, invalid sequences are possible with UTF-8 and, if any such are found, they will be replaced with one or more "replacement characters", or suppressed. These include non-Unicode sequences, non-characters, overlong sequences or surrogate codepoints encoded into UTF-8
isn't that the worst that can happen?
-
@l3u_ said in Is QString::fromUtf8(const QByteArray &str) safe for all kind of data?:
I wonder if it's safe to do
Is it safe? Yes it is.
Is it valid? That's debatable.
Note: You don't need to convert the datagram into a QString to split it. You can use
QByteArray::split()
. -
Thanks for the info! I also read the docs about replaced and/or suppressed characters, I just wondered if I got it right.
@JKSH I need to process the parts of the datagram as QStrings anyway later, so I would have to convert each item in the QList<QByteArray> of the split up QByteArray to a QString anyway. But thanks for your hint!
-
Hi @l3u_,
you should consider using QTextStream. This will make sure that you don't get garbled output if your UTF-8 string is distributed over several datagrams.
If you can assure your string is always in one datagram, your code should already work.
Regards
-
@aha_1980 Playing around with it, how would you assemble data fragmented in multiple datagrams? For a TCP server-client-implementation, I do this via a transaction like so:
... connect(m_socket, &QTcpSocket::readyRead, this, &AbstractJsonInterface::readData); .... void AbstractJsonInterface::readData() { QByteArray data; QDataStream stream(m_socket); stream.setVersion(STREAM_VERSION); for (;;) { stream.startTransaction(); stream >> data; if (! stream.commitTransaction()) { break; } // Do something with the data ... } }
but
QDataStream stream(m_socket);
won't work for aQUdpSocket
, and I have to read out the data like e. g. this:void AbstractDiscoverEngine::readDatagram() { QByteArray datagram; while (m_socket->state() == QAbstractSocket::BoundState && m_socket->hasPendingDatagrams()) { datagram.resize(int(m_socket->pendingDatagramSize())); m_socket->readDatagram(datagram.data(), datagram.size()); // Do something with the data } }
But apart from that: If I can be sure that teh data fits in one datagram (it will, I pass less than 100 bytes), there's no benefit of using a
QDataStream
orQTextStream
instead of using the "raw" UTF-8 data, is it? -
@l3u_ You can simply have a QByteArray buffer and append the data from each datagram (https://doc.qt.io/qt-5/qnetworkdatagram.html#data) to it. No need to resize anything, QByteArray will do this for you.
-
@l3u_ In principle you are right, but for the max. size please read https://serverfault.com/questions/246508/how-is-the-mtu-is-65535-in-udp-but-ethernet-does-not-allow-frame-size-more-than
-
@aha_1980 Okay, that is interesting. But still, 1500 bytes is more than enough for my purpose.
But if there was the mentioned fragmentation, would I have to care about it in my implementation, or would QUdpSocket deliver the complete de-fragmented datagram when I read one out via
QUdpSocket::receiveDatagram()
? -
Well, okay: Here's what the docs say concerning
QUdpSocket::writeDatagram()
:Datagrams are always written as one block. The maximum size of a datagram is highly platform-dependent, but can be as low as 8192 bytes. If the datagram is too large, this function will return -1 and error() will return DatagramTooLargeError.
Sending datagrams larger than 512 bytes is in general disadvised, as even if they are sent successfully, they are likely to be fragmented by the IP layer before arriving at their final destination.
So we simply shoudln't send large datagrams at all ;-) Apparently, below 512 bytes, there's no fragementation, everything arrives in one block and there's no problem.
-
But if there was the mentioned fragmentation, would I have to care about it in my implementation, or would QUdpSocket deliver the complete de-fragmented datagram when I read one out via QUdpSocket::receiveDatagram()?
I'd indeed expect to receive the complete datagram, as the fragmenting is transparent.
But the chance to loose a complete datagram is higher when it's fragmented, as all parts need to arrive for de-fragmenting.