Is there a more efficient way to decoding JSON from QWebSocket?
-
Many (if not most) WebSocket APIs out there use JSON format, which is sent as text frames, which use UTF-8 encoding on the wire. In
QWebSocket
, text frames are handled by following APIs, which expose the message as an already decodedQString
:QWebSocket::sendTextMessage(const QString &message)
QWebSocket::textMessageReceived(const QString &message)
On the other hand, the native JSON support in Qt which is implemented by
QJsonDocument
provide APIs for serializing and deserializing:toJson()
andfromJson()
, which take an UTF-8 encodedQByteArray
.That means, if I want to consume a JSON WebSocket API in Qt, I have to:
- receive a message as
QString
that is already decoded from UTF-8 internally byQWebSocket
- convert it to UTF-8
QByteArray
so I can pass it toQJsonDocument::fromJson()
Analogous but reverse process goes for sending JSON messages.
This seems very inefficient, as there are apparently two needless conversions: UTF-8 -> QString -> UTF-8, before the JSON message can actually by read by
QJsonDocument
.Surely, there must be a more efficient way. Am I missing something? Is there a way to send/receive an UTF-8 encoded string directly using
QWebSocket
? Can these conversions be avoided, or is the Qt API lacking here? Thanks for any clues. -
Hi,
From a very quick look at the class, what about using sendBinaryMessage ?
-
Many (if not most) WebSocket APIs out there use JSON format, which is sent as text frames, which use UTF-8 encoding on the wire. In
QWebSocket
, text frames are handled by following APIs, which expose the message as an already decodedQString
:QWebSocket::sendTextMessage(const QString &message)
QWebSocket::textMessageReceived(const QString &message)
On the other hand, the native JSON support in Qt which is implemented by
QJsonDocument
provide APIs for serializing and deserializing:toJson()
andfromJson()
, which take an UTF-8 encodedQByteArray
.That means, if I want to consume a JSON WebSocket API in Qt, I have to:
- receive a message as
QString
that is already decoded from UTF-8 internally byQWebSocket
- convert it to UTF-8
QByteArray
so I can pass it toQJsonDocument::fromJson()
Analogous but reverse process goes for sending JSON messages.
This seems very inefficient, as there are apparently two needless conversions: UTF-8 -> QString -> UTF-8, before the JSON message can actually by read by
QJsonDocument
.Surely, there must be a more efficient way. Am I missing something? Is there a way to send/receive an UTF-8 encoded string directly using
QWebSocket
? Can these conversions be avoided, or is the Qt API lacking here? Thanks for any clues.@martin_ky said in Is there a more efficient way to decoding JSON from QWebSocket?:
Surely, there must be a more efficient way.
You have a good point: https://bugreports.qt.io/browse/QTBUG-133100
-
Looks like there are QByteArray versions of send and receive for QWebSocket, at least in 6.9
-
Folks suggesting to use
sendBinaryMessage()
andbinaryMessageReceived()
- I don't think that's a choice that I can make as a consumer of a WebSocket API. As far as my understanding of WebSockets protocol go, they work in one of two modes - either transport text frames (encoded as UTF-8 bytes) or binary mode (raw bytes). If the server chooses to send a JSON as a text message, I cannot receive it via thebinaryMessageReceived()
signal -QWebSocket
does not emit this signal when a text message is received. To my knowledge, as of Qt 6.9.1, there is no way to access the UTF-8 bytes of a received text message via theQWebSocket
API, only the already and often unnecessarily decodedQString
.@JKSH: Thanks for the QTBUG - that is exactly my point.
-
Folks suggesting to use
sendBinaryMessage()
andbinaryMessageReceived()
- I don't think that's a choice that I can make as a consumer of a WebSocket API. As far as my understanding of WebSockets protocol go, they work in one of two modes - either transport text frames (encoded as UTF-8 bytes) or binary mode (raw bytes). If the server chooses to send a JSON as a text message, I cannot receive it via thebinaryMessageReceived()
signal -QWebSocket
does not emit this signal when a text message is received. To my knowledge, as of Qt 6.9.1, there is no way to access the UTF-8 bytes of a received text message via theQWebSocket
API, only the already and often unnecessarily decodedQString
.@JKSH: Thanks for the QTBUG - that is exactly my point.
@martin_ky You might like to post something to this effect into the bug report.
-
The only way for the API to know if it is receiving data that is to be binary or text is through the mime-type header. I'd suggest verifying if websocket protocol uses mime-types. If what @martin_ky is true, then he should see appropriate mime-type headers in the transactions.
The other option is to throw out QWebSocket and do TCP level session transport where everything is consider a stream of octets...or of course live with the QString/byte-array translations...I mean, is the amount of data being processed really a bottleneck in translating back and forth?
-
@JonB The bugreport QTBUG-133100 is accurate. I upvoted it. You can too, if you want.
@Kent-Dorfman There is no such thing as mime-type in the WebSocket protocol. Basically, there is just 1 bit in the wire format that differentiates between a text message and a binary messages. When QWebSocket receives a text messages it decodes the UTF-8 payload and emits the QString signal. I see no way around it in Qt 6.9.
-
As a C++ programmer, I’m all for squeezing out every bit of performance — no doubt about it. But trying to optimize WebSockets by skipping the UTF-8 to QString conversion?
That’s like shaving your eyebrows to run faster. -
As a C++ programmer, I’m all for squeezing out every bit of performance — no doubt about it. But trying to optimize WebSockets by skipping the UTF-8 to QString conversion?
That’s like shaving your eyebrows to run faster.@J.Hilk said in Is there a more efficient way to decoding JSON from QWebSocket?:
As a C++ programmer, I’m all for squeezing out every bit of performance — no doubt about it. But trying to optimize WebSockets by skipping the UTF-8 to QString conversion?
That’s like shaving your eyebrows to run faster.There's no reason why we can't avoid the conversion, after
QUtf8String
is added (see https://bugreports.qt.io/browse/QTBUG-104135 + https://bugreports.qt.io/browse/QTBUG-98430 )- WebSocket requires textual data to be encoded in UTF-8. So it makes sense for QWebSocket to output text as
QUtf8String
always. - Most JSON is passed around as UTF-8 too, so it makes sense for
QJsonValue::fromJson()
or similar to acceptQUtf8StringView
The holy grail in Qt is great API + good performance. Switching to
QUtf8String(View)
gives us both - WebSocket requires textual data to be encoded in UTF-8. So it makes sense for QWebSocket to output text as
-
As a C++ programmer, I’m all for squeezing out every bit of performance — no doubt about it. But trying to optimize WebSockets by skipping the UTF-8 to QString conversion?
That’s like shaving your eyebrows to run faster.@J.Hilk said in Is there a more efficient way to decoding JSON from QWebSocket?:
As a C++ programmer, I’m all for squeezing out every bit of performance — no doubt about it. But trying to optimize WebSockets by skipping the UTF-8 to QString conversion?
That’s like shaving your eyebrows to run faster.There is not just one conversion between UTF-8 and QString when reading a JSON text message from QWebSocket, but 3 conversions, two of them totally unnecessary:
1st: QWebSocket internally converts UTF-8 socket frame to QString.
2nd: QJsonDocument cannot read from QString, so I have to convert back to UTF-8 QByteArray.
3rd: Finally, when reading string values from a parsed QJsonDocument, these are converted to QString again.Eliminating 2 of those 3 conversions is hardly optimization, just common sense in avoiding unnecessary work.
-
@J.Hilk said in Is there a more efficient way to decoding JSON from QWebSocket?:
As a C++ programmer, I’m all for squeezing out every bit of performance — no doubt about it. But trying to optimize WebSockets by skipping the UTF-8 to QString conversion?
That’s like shaving your eyebrows to run faster.There is not just one conversion between UTF-8 and QString when reading a JSON text message from QWebSocket, but 3 conversions, two of them totally unnecessary:
1st: QWebSocket internally converts UTF-8 socket frame to QString.
2nd: QJsonDocument cannot read from QString, so I have to convert back to UTF-8 QByteArray.
3rd: Finally, when reading string values from a parsed QJsonDocument, these are converted to QString again.Eliminating 2 of those 3 conversions is hardly optimization, just common sense in avoiding unnecessary work.
@martin_ky said in Is there a more efficient way to decoding JSON from QWebSocket?:
Eliminating 2 of those 3 conversions is hardly optimization, just common sense in avoiding unnecessary work
You're right, waiting for a proper patch from you 🙂
Since both classes are already very old and Noone complained until now it looks the conversation is fast enough for most cases.
-
@martin_ky said in Is there a more efficient way to decoding JSON from QWebSocket?:
Eliminating 2 of those 3 conversions is hardly optimization, just common sense in avoiding unnecessary work
You're right, waiting for a proper patch from you 🙂
Since both classes are already very old and Noone complained until now it looks the conversation is fast enough for most cases.
@Christian-Ehrlicher said in Is there a more efficient way to decoding JSON from QWebSocket?:
Eliminating 2 of those 3 conversions is hardly optimization, just common sense in avoiding unnecessary work
You're right, waiting for a proper patch from you 🙂
To be fair, I don't think we can eliminate the conversions nicely until
QUtf8String
makes its debut. It's possible to add a way to retrieve text as aQByteArray
, but that would make our API bulkier/messier.Since both classes are already very old and Noone complained until now it looks the conversation is fast enough for most cases.
I do agree that the UTF-8 conversion overhead is unlikely to be a bottleneck in most cases.
Qt is famous for intuitive API, not for ultimate performance. So, while I do agree with @martin_ky that it's sensible to eliminate the unnecessary conversions, I would personally prefer to wait until we can do the elimination nicely because the performance penalty is tolerable in most cases.
At that stage, the benefits of fewer conversions and clearer API can be enjoyed by all users of Qt, not just in the WebSocket + JSON use-case.
-
@Christian-Ehrlicher said in Is there a more efficient way to decoding JSON from QWebSocket?:
Eliminating 2 of those 3 conversions is hardly optimization, just common sense in avoiding unnecessary work
You're right, waiting for a proper patch from you 🙂
To be fair, I don't think we can eliminate the conversions nicely until
QUtf8String
makes its debut. It's possible to add a way to retrieve text as aQByteArray
, but that would make our API bulkier/messier.Since both classes are already very old and Noone complained until now it looks the conversation is fast enough for most cases.
I do agree that the UTF-8 conversion overhead is unlikely to be a bottleneck in most cases.
Qt is famous for intuitive API, not for ultimate performance. So, while I do agree with @martin_ky that it's sensible to eliminate the unnecessary conversions, I would personally prefer to wait until we can do the elimination nicely because the performance penalty is tolerable in most cases.
At that stage, the benefits of fewer conversions and clearer API can be enjoyed by all users of Qt, not just in the WebSocket + JSON use-case.
@JKSH said in Is there a more efficient way to decoding JSON from QWebSocket?:
To be fair, I don't think we can eliminate the conversions nicely until QUtf8String makes its debut. It's possible to add a way to retrieve text as a QByteArray, but that would make our API bulkier/messier.
Qt is famous for intuitive API, not for ultimate performance.
IMO, it would be perfectly clean and intuitive if QWebSocket used
QByteArray
instead ofQString
to send/receive text messages. QByteArray has been the de-facto standard Qt container for UTF-8 strings for ages. Exchange of JSON text messages is a major use-case for websockets in general, so this really could have been anticipated and designed better. End of rant :) -
@martin_ky said in Is there a more efficient way to decoding JSON from QWebSocket?:
Eliminating 2 of those 3 conversions is hardly optimization, just common sense in avoiding unnecessary work
You're right, waiting for a proper patch from you 🙂
Since both classes are already very old and Noone complained until now it looks the conversation is fast enough for most cases.
@Christian-Ehrlicher said in Is there a more efficient way to decoding JSON from QWebSocket?:
Since both classes are already very old and Noone complained until now it looks the conversation is fast enough for most cases.
Yea, computers are really fast these days. Doesn't mean we should pessimize software though :)
-
@Christian-Ehrlicher said in Is there a more efficient way to decoding JSON from QWebSocket?:
Eliminating 2 of those 3 conversions is hardly optimization, just common sense in avoiding unnecessary work
You're right, waiting for a proper patch from you 🙂
To be fair, I don't think we can eliminate the conversions nicely until
QUtf8String
makes its debut. It's possible to add a way to retrieve text as aQByteArray
, but that would make our API bulkier/messier.Since both classes are already very old and Noone complained until now it looks the conversation is fast enough for most cases.
I do agree that the UTF-8 conversion overhead is unlikely to be a bottleneck in most cases.
Qt is famous for intuitive API, not for ultimate performance. So, while I do agree with @martin_ky that it's sensible to eliminate the unnecessary conversions, I would personally prefer to wait until we can do the elimination nicely because the performance penalty is tolerable in most cases.
At that stage, the benefits of fewer conversions and clearer API can be enjoyed by all users of Qt, not just in the WebSocket + JSON use-case.
@JKSH said in Is there a more efficient way to decoding JSON from QWebSocket?:
To be fair, I don't think we can eliminate the conversions nicely until QUtf8String makes its debut. It's possible to add a way to retrieve text as a QByteArray, but that would make our API bulkier/messier.
I looked into this during late Qt5 times because of the same concerns wrt the conversation and got to the same conclusion that it will just clutter the api for no real benefit. At least not in my benchmarks. Therefore the request for an efficient and clean patch...
-
@JKSH said in Is there a more efficient way to decoding JSON from QWebSocket?:
To be fair, I don't think we can eliminate the conversions nicely until QUtf8String makes its debut. It's possible to add a way to retrieve text as a QByteArray, but that would make our API bulkier/messier.
Qt is famous for intuitive API, not for ultimate performance.
IMO, it would be perfectly clean and intuitive if QWebSocket used
QByteArray
instead ofQString
to send/receive text messages. QByteArray has been the de-facto standard Qt container for UTF-8 strings for ages. Exchange of JSON text messages is a major use-case for websockets in general, so this really could have been anticipated and designed better. End of rant :)@martin_ky said in Is there a more efficient way to decoding JSON from QWebSocket?:
IMO, it would be perfectly clean and intuitive if QWebSocket used
QByteArray
instead ofQString
to send/receive text messages. QByteArray has been the de-facto standard Qt container for UTF-8 strings for ages.No. UTF-8 text should not be treated as an array of bytes!
QString("É").toLower()
gives you"é"
QString("É").toUtf8().toLower()
does not (and it should not).
Exchange of JSON text messages is a major use-case for websockets in general, so this really could have been anticipated and designed better. End of rant :)
No arguments there :)
But the current scenario is that QWebSocket uses QString. The way out is to switch to QUtf8String, not to QByteArray.
-
@martin_ky said in Is there a more efficient way to decoding JSON from QWebSocket?:
IMO, it would be perfectly clean and intuitive if QWebSocket used
QByteArray
instead ofQString
to send/receive text messages. QByteArray has been the de-facto standard Qt container for UTF-8 strings for ages.No. UTF-8 text should not be treated as an array of bytes!
QString("É").toLower()
gives you"é"
QString("É").toUtf8().toLower()
does not (and it should not).
Exchange of JSON text messages is a major use-case for websockets in general, so this really could have been anticipated and designed better. End of rant :)
No arguments there :)
But the current scenario is that QWebSocket uses QString. The way out is to switch to QUtf8String, not to QByteArray.
@JKSH said in Is there a more efficient way to decoding JSON from QWebSocket?:
No. UTF-8 text should not be treated as an array of bytes!
I never suggested to do Unicode text processing using a
QByteArray
. An UTF-8 encoded string is quite literally an array of bytes, and I think there is no better suited Qt container for storing those at the moment. It is perfectly fine and safe to pass UTF-8 encoded strings between APIs inQByteArray
s. Even the return type ofQString::toUtf8()
suggests thatQByteArray
is currently the Qt's type of choice to store UTF-8 strings. So I'm little hard-pressed to accept that reading and writing UTF-8 text usingQByteArray
on aQWebSocket
would be somehow bad API and had to be avoided, while at the same time usingQByteArray
seems totally fine for passing UTF-8 strings in many other parts of Qt.I'm not even sure what added value over
QByteArray
doesQUtf8String
provide? Except maybe some sort of type tagging - way of saying that this not just any byte array, but an utf-8 encoded char array. Any kind of UTF-8 string processing (except trivials like concatenation) requires decoding into some evenly sized char type array anyway. But that debate is off-topic.