Question regarding serialization of complex user classes

DRoscoe

I am playing around with using QDataStream to serialize my application's message classes over a TCP socket. I have a test message class which has several primitive data members as well as a user class as a data. I've the overloaded necessary 'operator <<' methods to serialize my object to the stream. To get the message onto the stream, all I have to do is (simplified example):

QDataStream out;
MyMsg my_msg();

out << my_msg;

My question is:

How do I know when I have successfully received the entire message over the socket on the receiving end? I can't assume I've received it all with the first readyRead() signal, and I don't know the number of bytes actually streamed over the socket. All of the custom serialization examples I've found mention sockets as a viable QIODevice but use QFiles in the actual example, so its not clear to me how to effectively use QDataStream with a custom user class to serialize it over a socket.

VRonin

@DRoscoe said in Question regarding serialization of complex user classes:

I don't know the number of bytes actually streamed over the socket

Yes, you do, using QIODevice::bytesAvailable() and this is the method used before Qt5.7, nowadays the process is much easier:

QDataStream in(socket);
MyMsg destination;
in.startTransaction();
in >> destination;
if(in.commitTransaction()){
// All data necessary was received
}

You can use the fortuneclient example in Qt for more details

P.S.

has several primitive data members

make sure you specify the size of primitive integers when adding them to stream

int number=42;
out << number; // WRONG!
out << static_cast<qint32>(number); // correct

same thing on the reading end, do not read int, read qint32 this way you avoid conflicts between different architectures. (the same thing applies to char, short and long long)

DRoscoe

@VRonin I think you misunderstand me. If the QIODdevice in my example were a socket, as written, it would give no indication in the message data, how many bytes are in the message. How, then, can the receiver check bytesAvailable() to ensure it has received the full message? I can't use the new approach because I am limited to an older version of Qt.

My point is, it seems that it is insufficient to simply serialize a complex class to QDataStream over a socket, since you need some form of header to indicate at least the number of bytes in the message. Additionally, if I have several message types with differing data members, how does the receiver differentiate what message its receiving? The header data would also have to contain information to disambiguate the message type on the stream. I could find no examples of how to do this with a QDataStream and a custom class which overloads the necessary << operators. My current (working) implementation serializes everything to a QByteArray, which has a prepended header to provide this information. It works, but I don't feel it is an ideal implementation

I am very familiar with the Fortune example. It is a poor example if your intent is to serialize custom complex classes via QDataStream. In the fortune example, both sides know exactly what to expect, which is simply a string. That is a much easier thing to implement than what I am asking about.

VRonin

See https://forum.qt.io/topic/71367/qtcpsocket-how-to-receive-different-type-of-serialized-objects-in-sequential/6

Basically you write the header manually

@DRoscoe said in Question regarding serialization of complex user classes:

It is a poor example

I can't agree more but it's all Qt ships

DRoscoe

@VRonin That's exactly what I was looking for. It changes how I was thinking of doing the serialization, but I actually like your solution better since I can keep the message in its native form until the instant I need to send it. The problem with my current approach, which works, is that I need to explicitly call a ::serialize() method, which populates a QByteArray. Since I have to write all the code to do the serialization anyway, I wanted to simply supply the class to a QDataStream with the overloaded streaming operators, but I was still thinking in terms of my original approach in that I would be creating the QDataStream in advance of actually sending the data (i.e. a call to ::serialize() but with a QDataStream), which would make prepending the header more difficult later, since knowledge of the message type contained in the stream would be lost and I'd not be able to reconstitute the original message on the receiving end.

I agree that the new transaction approach is much cleaner, but unfortunately its not an option with older Qt Versions

Thanks!

kshegunov

You didn't do that bad. ;)
I usually go the "older" approach, the one predating the transaction support, old habits die hard I suppose. The problem isn't that much how you create the data stream (as you can attach it to the socket) but rather how to signal message (in)completeness, if I understand your question correctly. Then with that in mind you could set the internal state of the data stream from inside the shift operator overload and decide whether to continue receiving data based on that. For example:

QDataStream & operator >> (QDataStream & in, MyClassToSerialize & myObject)
{
    if (!myObject.deserialize(in))  {
        in.device()->reset(); //< Go back at the beginning (requires a buffered device, i. e. random access, so doesn't work with some types)
        in.setStatus(QDataStream::ReadPastEnd);
    }
    return in;
}

Obviously, the deserialize method here has the following prototype: bool MyClassToSerialize::deserialize(QDataStream &), and you would use this in the following manner:

QIODevice * device; //< The device we are reading from
QDataStream in(device);

MyClassToSerialize myObject;
in >> myObject;

if (in.status() == QDataStream::ReadPastEnd)
    in.resetStatus(); // Insufficient data
else if (in.status() != QDataStream::Ok)
    ; // Some other error occured

The writing to a stream is quite symmetrical to the shown code.

DRoscoe

@kshegunov Ahh! so you're implementing something of a sentinel in the stream? I would imagine that the transaction approach used in the current Qt implementation works in a similar fashion.

One thing that concerns me on the face of it is that it appears to be inefficient, both your approach and the transaction approach, in that you are streaming bytes to a structure only to wind it back and do it again, if you determine that you don't have it all. I am reluctant to go so far as to say it IS less efficient because Qt tends to play some nice tricks in the background to mitigate apparent inefficiency.

kshegunov

@DRoscoe said in Question regarding serialization of complex user classes:

so you're implementing something of a sentinel in the stream?

Well, I'm just making use of the stream status, which is internally kept by Qt. But the idea is, if you don't have enough data, to abort the operation and rewind the device pointer position.

I would imagine that the transaction approach used in the current Qt implementation works in a similar fashion.

I would, as well. At least it appears to be the most straightforward way of doing it.

it appears to be inefficient

It is! The "best" way is to have a full-fledged protocol implemented, but this does require some doing, so people (and myself) usually make a concession and settle for a statically(!) sized header (just as @VRonin suggested). You need the header to be of the same size so you can check if it's wholly readable in one simple check if you have received enough bytes. After that, the header will tell you what you need to expect (as data sizes) so you can deserialize efficiently. The above (along with the transaction support) would be more generic in a sense, but do suffer an efficiency hit.

VRonin

@DRoscoe said in Question regarding serialization of complex user classes:

I would imagine that the transaction approach used in the current Qt implementation

Transactions are a bit better as they support serial devices (like sockets)

kshegunov

@VRonin said in Question regarding serialization of complex user classes:

Transactions are a bit better as they support serial devices (like sockets)

Probably correct, as they have the benefit of accessing the devices' buffers directly and/or buffer the read themselves. In any case I agree with you that putting a simple header should solve most issues out the first go, and it's pretty simple to implement too. :)