QByteArray data sharing and ownership



  • I know this is an old topic, but I discovered one thing that really surprised me when it comes to QByteArray::capacity() / reserve() and Qt::QueuedConnection.

    First of all, reserve() seems not to be reliable for keeping the promise of the container's size and memory buffer integrity even under normal simple conditions. After having reserved a certain size and then modifying the actual length of the QByteArray using resize() to a size LESS or EQUAL to the size of the capacity() would sometimes (not always) have the array to reallocate its data and again reducing the capacity. This is in my world a breakage of the semantics in the container behavior. Expecting the QByteArray::data() not to change the actual memory location is then completely false. One of the expectations of the reserve() call is to guarantee the container not to reallocate itself unless absolutely needed.

    The only way to be sure is to take the QByteArray's internal DataPtr and on that modify it's length specifier (which of course is a wrong way of doing it, relying on internal interfaces is never right).

    But one thing about this that is even worse. Having a set capacity of a QByteArray to something and then passing that QByteArray as const reference to a signal over a QueuedConnection actually modified the capacity() ! (and indeed reallocated the array). The violation of rules in this case I feel is even worse and I can hardly understand what the moc code is doing here to make this happen (or is it QByteArray). It is supposed as mentioned in this thread to take a copy of the array and pass that copy to the other thread (this is nice of course), but how can that result in the original container in the signal emitting thread being reallocated and having its capacity modified? Here we are even talking about breakage of const promise.

    If I'm right this is totally unexpected behavior and it should be filed as a serious bug I think.

    #include <QDebug>
    #include "asyncarray.h"
    
    ASyncArray::ASyncArray(QObject *parent) : QObject(parent)
    {
    	connect(this, &ASyncArray::testSignal, this, &ASyncArray::testSlot, Qt::QueuedConnection);
    
    	QByteArray testArray;
    
    	testArray.reserve(100);
    	testArray.resize(50);
    	memset(testArray.data(), 0xaa, testArray.capacity());
    
    	qDebug() << "The signal array is of size" << testArray.size() << "and of capacity" << testArray.capacity() << "at location"
    					 << (void*)testArray.data();
    	emit testSignal(testArray);
    	qDebug() << "The signal array is of size" << testArray.size() << "and of capacity" << testArray.capacity() << "at location"
    					 << (void*)testArray.data();
    }
    
    
    void ASyncArray::testSlot(const QByteArray& array)
    {
    	qDebug() << "The slot array is of size" << array.size() << "and of capacity" << array.capacity() << "at location"
    					 << (void*)array.data();
    
    }
    

    Produces:
    The signal array is of size 50 and of capacity 100 at location 0x1eada58
    The signal array is of size 50 and of capacity 50 at location 0x1eadec8
    The slot array is of size 50 and of capacity 100 at location 0x1eada58

    So the receiver (SLOT) receives the original array and the transmitter (emitter) gets the copy? This just can't be right! I suspect here the moc code mixing up the copy and the original.

    [Forked from https://forum.qt.io/topic/67855/slot-signal-safety-with-qbytearray-references ~kshegunov]


  • Lifetime Qt Champion

    Hi,

    For such low-level, internals specific topics, I'd recommend bringing this to the interest mailing list. You'll find there Qt's developers/maintainers. This forum is more user oriented.



  • @SGaist Ok, thanks, I'll do that.
    I have now concluded with a sample code.

    But then I do have a more user-oriented question which is more right on this topic.

    If I have a BlockingConnection (between signal and slot, over threads). If I pass a QByteArray as a non-const reference, modify it in the slot, do I get the modified version of it in the emitter context?

    It SHOULD in my oppinion, but am I right?



  • @Larswad said in SLOT/SIGNAL safety with QByteArray &references:

    @SGaist Ok, thanks, I'll do that.
    I have now concluded with a sample code.

    But then I do have a more user-oriented question which is more right on this topic.

    If I have a BlockingConnection (between signal and slot, over threads). If I pass a QByteArray as a non-const reference, modify it in the slot, do I get the modified version of it in the emitter context?

    It SHOULD in my oppinion, but am I right?

    from the block post @Chris-Kawa mentioned earlier (about a year ago x) )
    I would say your assumption is correct.

    BlockingQueuedConnection

    BlockingQueuedConnection is a mix between DirectConnection and QueuedConnection. Like with a DirectConnection, the arguments can stay on the stack since the stack is on the thread that is blocked. No need to copy the arguments. Like with a QueuedConnection, an event is posted to the other thread's event loop. The event also contains a pointer to a QSemaphore. The thread that delivers the event will release the semaphore right after the slot has been called. Meanwhile, the thread that called the signal will acquire the semaphore in order to wait until the event is processed.


  • Qt Champions 2016

    Yes, you are right (unless I'm wrong too). The blocking connection does not copy out the arguments passed when the signal's emitted in contrast to the ordinary queued connection, which relies on the metatype sysytem for the copying.

    As for your first question, I really think you expect something to work in a way that's not documented, so my advice is: don't. There's no guarantee that the reserved size of the containers is carried through any implicit or explicit copies of the objects. Those functions are provided as conveniences where you know the container sizes in advance so you can allocate the memory in one go, instead of relying on the internal regrowth strategy. If you want the containers to be an exact size throughout, then resize them instead.


  • Lifetime Qt Champion

    Out of curiosity, why do you want/need to pass a reference to your data through signals and slots to another thread, modify it there while blocking the original thread the signal was emitted from ?



  • Thanks for your answers, I will answer and be more clear about my use cases.

    But first, I have investigated a little further on the strange modification of the const QByteArray that get's modified when it is copied (it had absolutely nothing to do with the moc or metaobject construction). When a QByteArray is copied from one object to another is is shared. At least up to the point when you modify it in any way. This is acceptable of course, but one doesn't immediately realize that taking the pointer from the original array after it has been shared, will immediately cause a detach() (providing the pointer is shared) and a realloc of the object, loosing any earlier reserve (capacity) settings on it (well, it COULD in principle have preserved the capacity there). So, even if this is understandable the behavior is somewhat subtle.

    Now my use case is to keep a certain buffer size of the array to be filled as I hand out the QByteArray's pointer to a libusb outstanding request. As the request returns (with the ACTUAL size) i modify the size of the QByteArray to the real one (this is where the resize function was not trustful, it would resize but it would reallocate sometime and also change the capacity of the array. Now further, within this libusb execution context the array data is to be sent to another thread for queuing and later processing, making a copy is ok on the emit call, the signal signature looks like this:

    void bytesReceived(const EndPoint& ep, const QByteArray& data);
    

    And after that emit, immediately resubmit the QByteArray's buffer by handing out the pointer the libusb request structure.
    To my surprise, the emit would cause the QByteArray (which was passed to the signal as a const ref) become modified right after the data() call is made on the original array that was passed as const (and this is because of the copy being made from it). Here is where the sharing of the array becomes strange in my opinion, something being copied from it while it was given as a const has suddenly got its contents modified, and this is because I later take out the pointer from it (because the array has been shared now, but is being detached and reallocated).
    Ironically, if I change the signature to pass the array by value, it is completely unchanged (even same capacity) when I afterwards take out the data pointer. This is totally explainable, it is however from a user point of view sort of weird.

    So, even though I do understand the mechanisms going on here and why they are there, I feel there is little or no use at all for the capacity/reserve support , at when it comes to the ability to use the data pointer for buffer management. All this IS documented, well, sort of, except for the total sudden loss of capacity setting upon data() access after an implicit share due to copy.
    Personally I don't find the use case very strange at all, but I will for sure move away from QByteArrays when using them in more complex flows than rather closed scope data manipulations. All else becomes too complex and hard to track. In that case I'd rather be better off with regular C arrays allocated manually on the heap.

    I am not mad over this, I think QByteArray has its advantages and place, but it just made me a little disappointed though because I had hopes on arrays being used as buffer containers this way and in more advanced ways.


  • Moderators

    one doesn't immediately realize that taking the pointer from the original array after it has been shared, will immediately cause a detach()

    Only calling non-const methods on the container causes detach. If you only want to read you can use constData(), constBegin() etc. and that would not detach.

    I feel there is little or no use at all for the capacity/reserve support

    Here's a popular case:

    QByteArray b;
    b.reserve(100);
    b += "bla bla bla bla";
    b += someData;
    b += someOtherData;
    

    at when it comes to the ability to use the data pointer for buffer management

    Keeping around pointers to implicitly shared data is a bad idea in general, not only with QByteArray. Same goes for QString, QVector etc.


  • Qt Champions 2016

    @Larswad said in SLOT/SIGNAL safety with QByteArray &references:

    So, even if this is understandable the behavior is somewhat subtle.

    I disagree, it's quite clearly stated in fact:

    The container classes are implicitly shared, they are reentrant, and they are optimized for speed, low memory consumption, and minimal inline code expansion, resulting in smaller executables.

    http://doc.qt.io/qt-5/containers.html
    http://doc.qt.io/qt-5/implicit-sharing.html

    Now my use case is to keep a certain buffer size of the array to be filled as I hand out the QByteArray's pointer to a libusb outstanding request. As the request returns (with the ACTUAL size) i modify the size of the QByteArray to the real one (this is where the resize function was not trustful, it would resize but it would reallocate sometime and also change the capacity of the array.

    You could either use your own allocated buffer that you later attach/copy into a byte array or simply resize() the byte array beforehand and then shrink it back if necessary. Also the docs state that behavior:

    To obtain a pointer to the actual character data, call data() or constData(). These functions return a pointer to the beginning of the data. The pointer is guaranteed to remain valid until a non-const function is called on the QByteArray.

    http://doc.qt.io/qt-5/qbytearray.html#details


  • Lifetime Qt Champion

    Also, are you sure that the signal and slot paradigm is the right in this case ?

    Using a Q_INVOKABLE method with QMetaObject::invokeMethod might make your intent clearer.



  • I am not going to argue myself to death over this, it is a case where I was led into false expectations even though I knew QByteArray was implicitly shared. It is the implications of being implicitly shared that can cause surprising behaviors, e.g. indirectly breaking const promise as a result of sharing.

    @kshegunov That statement tells only the implicit sharing is used on the containers, which of course is good and more or less obvious even without that statement. But it leaves a lot of interpretation in every use case and detail what the implications of that will be. It's a narrow and dangerous walk if not knowing the complete ruleset here. The devil is absolutely in the details here.

    @Chris-Kawa Yes, that is a common case in the aim of improving the performance of it, as a result of it being a convenience container that has no idea about what to preallocate. Normally not a problem when NOT using a container class because then you have to take care of both the size and performance issue anyway, albeit with maybe a little more work and careful consideration. So I would still have expected some other more functional use of the reserve feature other than improving the performance of the container itself (which often in the end still becomes a very rough estimate). Heap thrashing and deep copies are still very hard to predict when things start to work "under the hood" like they do in complex container classes. One single code statement can explode in complexity and expand to serious cycle consumption.

    Keeping around pointers to implicitly shared data is a bad idea in general, not only with QByteArray. Same goes for QString, QVector etc.
    Yes, obviously and sadly it is and I will refrain from it in future. In this case the implicit sharing killed my whole purpose,.

    I would never use the const version to get the data for modification, even if I knew the would not cause a detach() and that it keeps the data pointer and reservations stable, because the whole semantic of those is to use the data only for reading and not to be handed out to some other uses.
    I have walked through larger part of the QByteArray implementation these days, so I have a pretty good picture now what it does internally and I regard it now as something that you start putting into use once you already HAVE the data produced from other API levels, after that the data can be contained at some point using QByteArray and THEN conveniently get shifted around inside code strictly built on Qt framework. All other uses as a "buffer-container-manager-exposer-allocator" outside that becomes a gamble of life and death.

    @SGaist Thanks. But you mean that Q_INVOKABLE declaration indicate that a metacall is needed here? I have mostly used that in the cases of exposing to declarative.


  • Lifetime Qt Champion

    No, not needed. It indicates that you can use it through a meta call in addition to use it like a normal function.



  • Things get even more useless here.

    How in the world would anyone take ownership of the data from a QByteArray (e.g. calling something with the intent like takeRawAndDetach()) like saying to the array "The data is now MINE, I have ownership now, if you are shared I want a copy of you, otherwise return your data and drop off". Because as I interpret the detech() implementation, a sequence of data(), detach() would only deallocate that data that was returned to data() leaving me a dangling pointer.
    Wow, it can't be that user-unfriendly and performance ugly that I would have to do something like:

    auto myData(new char[array.size()]);
    memcpy(myData, array.constData(), array.size());

    Just to get ownership of the contained data.
    I would have preferred if I could take over the data through a single call that would take care of actually making and returning a raw copy if the data is shared? There is not such method or feature in QByteArray?

    Meaning (if that is the case): You can't create a method in your code that takes arguments as a QByteArray reference and the let that function take over the byte array data without actually in there make a NEW buffer on the heap and deep copy of the whole QByteArray using regular new and memcopy?
    If this actually IS the case I'm starting to think the shared containers suck, feature wise and performance wise. The cost is too big.


  • Qt Champions 2016

    I hope you don't mind that I forked this discussion as a separate topic. It seemed appropriate.

    @Larswad said in QByteArray data sharing and ownership:

    How in the world would anyone take ownership of the data from a QByteArray (e.g. calling something with the intent like takeRawAndDetach()) like saying to the array "The data is now MINE, I have ownership now, if you are shared I want a copy of you, otherwise return your data and drop off".

    You can't do that directly, as you correctly observed.

    Wow, it can't be that user-unfriendly and performance ugly that I would have to do something like:

    auto myData(new char[array.size()]);
    memcpy(myData, array.constData(), array.size());

    Just to get ownership of the contained data.

    That's the correct way currently. I could imagine a method to be added that does that however, so my advice is to post it as a suggestion on the bugtracker. I'd even be willing to implement it when time permits.

    I would have preferred if I could take over the data through a single call that would take care of actually making and returning a raw copy if the data is shared? There is not such method or feature in QByteArray?

    Not at this time as far as I'm aware.

    Meaning (if that is the case): You can't create a method in your code that takes arguments as a QByteArray reference and the let that function take over the byte array data without actually in there make a NEW buffer on the heap and deep copy of the whole QByteArray using regular new and memcopy?

    Usually one goes the other way around - allocating your own buffer and attaching a QByteArray "interface" to it (e.g. through QByteArray::fromRawData) wherever it's needed.

    If this actually IS the case I'm starting to think the shared containers suck, feature wise and performance wise. The cost is too big.

    You are entitled to your opinion, I personally like them and use shared data classes extensively, but in any case they're not going away any time soon. The whole signal-slot mechanism would fall apart if implicit sharing were to be removed.



  • @kshegunov Thanks for the answer!
    Please disregard my last sentence, it was written in frustration on the situation at hand and was clearly uncalled for.


  • Qt Champions 2016

    @Larswad said in QByteArray data sharing and ownership:

    Please disregard my last sentence, it was written in frustration on the situation at hand and was clearly uncalled for.

    Happens to all of us, so I (usually) try to not nitpick about it. I was serious about posting this as a suggestion in the bugtracker though, you raised a good point. If you decide to proceed this way, please post also the link to the tracker issue here.



  • @kshegunov Yes, I think I just might file that feature request on the tracker. I came to think about quite a few years ago when I used COM's CComSafeArray and CComBSTR that had the detach() method as a way to leave their guts so to say, they on the other hand didn't support sharing which of course makes things simpler.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.