Separating TCP bytes from readyRead()

jars121

This seems like a really simple one, but I can't quite get it to work as intended.

I'm receiving TCP data, which employs the below custom protocol:

"int;int;int;int%"

Each packet sent from the server is semicolon delimited, with '%' termination. The data is often received with '\x00' null characters, so an example of the data received is as follows:

"23;14;02;33242%\x00\x00"

I handle incoming data with the readyRead() signal on my tcpsocket instance. When a single, complete 'packet' is received from the server, processing is straight forward. I can capture the data as a QString, and simply remove the '%' and '\x00' characters. However, TCP data isn't treated as a 'packet', so the bytes from the server can be received over a number of individual 'packets'.

As such, I have a buffer which is appended as part of the readyRead() function. The problem I'm facing occurs when a readyRead() tcpsocket->readAll() contains multiple 'packets'. The below example will be used to illustrate the issue.

Data received from server: "23;14;02;33242%\x00\x00" "14;44;52;23433%\x00\x00"

In the above example, two complete 'packets' have been received in the one readyRead() function, which is typical of TCP server/client connections. Unfortunately, when trying to read the data as a QString, only the first packet is read. I.e.:

QString data = QString(tcpsocket->readAll());
qDebug() << data.remove("\x00");

The above qDebug() produces "23;14;02;33242%". I'm guessing that in casting the readAll() QByteArray to QString, it's effectively performing a .fromUtf8() call, and truncating anything after the first instance of \x00.

I've also tried reading the data into a QList<QByteArray>, by splitting the data by the '%' character. Again, this partially works:

QList<QByteArray> data = tcpsocket->readAll().split('%');
qDebug() << data;

The above qDebug() produces ("23;14;02;33242", "\x00\x00" "14;44;52;23433", "\x00\x00"). The second QList element has the correct second packet, but it has the \x00\x00 characters from the first packet prepended to it. The prepended \x00 characters are a separate string, so perform a .remove("\x00") results in an empty string in front of the second packet instead: ("23;14;02;33242", "" "14;44;52;23433", "").

What's the best way to go about this?

Kent-Dorfman

If you're receiving nulls, then either the server, or the client isn't handling the correct number of characters...those nulls are blank buffer bytes due to incorrect read/write lengths. no one in their right mind would design an ASCII readable protocol with a distinct EOM character (%) and also insert nulls into the stream.

jars121

The server uses a character array of a fixed size, which accommodates the full protocol format. If one of the component integers uses less characters than are assigned (i.e. 6,500 has 4 characters vs. 65,000 has 5 characters), the character array is buffered with a null character (as you've said).

I fully intend to revisit this process, and look to size the character array to prevent any null character buffering, but for the time being I'd like to understand how to address the issue I've detailed.

jars121

Actually I'm going to go ahead and work on dynamically sizing the character array on the server after all, thanks for your suggestion :)

jars121

I spoke too soon. I've corrected the server-side code so that the character array size exactly matches the number of characters used in the protocol. However, the data is always received with a \x00 null character at the end:

"23;13;34;34234\x00" "44;21;54;32322\x00"

Now, I can remove the null character by reading the data into a QString with simplified(), but per my original question, this will only capture the first packet; anything after the first instance of \x00 is truncated. Is there an obvious way to iterate through the readAll() QByteArray data? Per the above example, each packet is received and displayed as a separate string, so surely I can access each individually?

Kent-Dorfman

then you still aren't doing it right...buffer sends, that is.

parsing the stream on the receiving end is basic computer science algorithmic stuff.

SGaist

Hi,

The usual way when you have a protocol like this is to cumulate the data in one buffer (which is a member of the receiving class) and then check if you have at least one frame, extract it, process it. Rince and repeat until no full frame is available anymore.

jars121

Thanks @SGaist , that's exactly what I'm trying to do :)

I've actually made some progress, but having done some further research, I think it's going to be more robust (particularly as the protocol itself becomes more complex in the future) if I send the length of the 'packet' as the first integer in the sequence, so I know exactly how much data to read for each packet. I've been able to remove the \x00 characters and split the QByteArray by the '%' termination character, but I'm still ending up with some empty QByteArrays as part of the resultant QList<QByteArray>. If I include the packet length in the protocol, I should be able to simplify the processing component on receipt, and have certainty that the data received either is or isn't complete.

JonB

@jars121 said in Separating TCP bytes from readyRead():

Each packet sent from the server is semicolon delimited, with '%' termination.

@SGaist said in Separating TCP bytes from readyRead():

to cumulate the data in one buffer (which is a member of the receiving class) and then check if you have at least one frame, extract it

Question @ Qt experts: QDataStream transactions (https://doc.qt.io/qt-5/qdatastream.html#using-read-transactions) are often a neat, supplied way of handling the necessary buffered reads from the server without having to write one's own code. Given the OP's "variable length delimited format" protocol, and/or if he puts a "packet length" at the start, can code be written to use QDataStream stream with some type in transactions for this, or will that not be possible?

(Also, and I get confused on this, once and for all so that I remember: if the server is sending a stream of bytes, not through QDataStream, can you still use QDataStream to read at client, or does QDataStream always put in its own type/count information data anywhere such that you cannot use it to read non-QDataStream-generated arbitrary bytes?)

jars121

@JonB Very good question! I was wondering something similar about QDataStream.

Kent-Dorfman

QDataStream is certainly a valid way of handling the whole thing. of course OP has not said whether the server is under his control or a pre-existing service that he needs to write a client to interface with.

The downside of QDataStream would be that it then means both the client and server need to use QDataStream. I always prefer to separate client and server so that they don't need to be based on the same development framework. The only binding between them should be a well-designed communications protocol.

I would tell op to read a few RFC documents to learn how internet protocols are designed.

JonB

@Kent-Dorfman said in Separating TCP bytes from readyRead():

The downside of QDataStream would be that it then means both the client and server need to use QDataStream.

OK, that rules it out then, which is why I asked. So QDataStream (for its transaction/buffering) cannot be set to read a stream of incoming bytes that do not have whatever "header" a QDataStream writer would have put on them, right?

SGaist

@jars121 I meant using a QByteArray as buffer so you concatenate everything you receive and then remove frame by frame from it.

jars121

I've made some progress on this, and am feeling more comfortable now. I've amended the protocol somewhat, to both incorporate the packet size at the start, and also remove the ';' delimiters and '%' termination character:

"intSizeint1int2int3int4"

In packaging the data on the server (server is a baremetal MCU written in c), the '\x00' is still appended to the packet, so the data is received as a QByteArray by tcpsocket->readAll() as follows:

QByteArray temp = tcpsocket->readAll();
qDebug() << temp;

"intSizeint1int2int3int4\x00"

As per earlier comments, readAll() contains many of these QByteArray elements back to back:

"intSizeint1int2int3int4\x00" "intSizeint1int2int3int4\x00" "intSizeint1int2int3int4\x00" "intSizeint1int2int3int4\x00"

I create a temporary QList<QByteArray> from temp, splitting the QByteArray elements with the \x00 character, and append them to my class QList<QByteArray> buffer:

QList<QByteArray> tempList = temp.split('\x00');
classByteListBuffer.append(tempList);
qDebug() << classByteListBuffer;

"intSizeint1int2int3int4" "intSizeint1int2int3int4" "intSizeint1int2int3int4" "intSizeint1int2int3int4"

Note that I do sometimes get empty QByteArray elements in classByteListListBuffer. As such, I then concatenate classByteListBuffer into my main class QByteArray buffer, by joining all the elements of classByteListBuffer, which takes care of any empty elements:

classBuffer = classByteListBuffer.join();
qDebug() << classBuffer;

"intSizeint1int2int3int4intSizeint1int2int3int4intSizeint1int2int3int4intSizeint1int2int3int4"

Now I can process the QByteArray buffer, using the intSize component to check whether a complete packet has been received. If a complete packet has been received, I process the packet, and remove the packet from the buffer using remove().