Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Copy constructor question: memcpy, QString/QByteArray



  • I know that Qt has its own variant type, but for project based reasons (constraints), I am creating my own "basic data" type which is meant to hold what I am calling "fundamental" data (a number or a reference to a string). Here is the basic concept:

    struct IIMOECORE_EXPORT FundamentalDataContainer //Constraint: Maximum of 8 bytes in size
    {
    private:
    	union
    	{
    		QString _wideString;
    		QByteArray _string;
    		void* _pointer;
    		__int64 _long;
    		double _double;
    		float _single;
    		__int32 _int;
    		bool _bool;
    	};
    	...
    };
    

    I need to make a copy constructor. If possible, I would like to avoid using up more memory to keep track of the content type (it is unnecessary so far). Would memcpy create a problem because QByteArray and QString count references? If so, is there a "reference count safe" way to make a copy of the data without knowing what the contents are a priori?


  • Moderators

    @primem0ver said in Copy constructor question: memcpy, QString/QByteArray:

    If possible, I would like to avoid using up more memory to keep track of the content type (it is unnecessary so far)

    Could you give more details on how you've made it unnecessary? How do you retrieve the data without knowing what's inside?

    Would memcpy create a problem because QByteArray and QString count references?

    Yes.

    is there a "reference count safe" way to make a copy of the data without knowing what the contents are a priori?

    I'll have to get back to you. Haven't thought of one yet.

    I am creating my own "basic data" type which is meant to hold what I am calling "fundamental" data (a number or a reference to a string)

    Do note that QString and QByteArray are not very basic! They don't qualify as POD types (Plain Old Data types), and non-PODs are brittle when placed inside unions: https://stackoverflow.com/questions/19764150/questions-regarding-c-non-pod-unions

    P.S. memcpy() should also be reserved for PODs only



  • @primem0ver said in Copy constructor question: memcpy, QString/QByteArray:

    If so, is there a "reference count safe" way to make a copy of the data without knowing what the contents are a priori?

    I really don't see how there can possibly be. If you don't know whether the bytes content are or are not, say, a QString, how can anything know about reference counts in any way, no matter which way you shake it? Your storage of, say, an __int64 or a void* or a QString could all have exactly the same 8-byte content so nothing could distinguish them without being told what the content actually represents.



  • @JKSH said in Copy constructor question: memcpy, QString/QByteArray:

    Could you give more details on how you've made it unnecessary? How do you retrieve the data without knowing what's inside?

    The memory space is used for different purposes in different contexts/sistuations. It is meant to serve as a storage container for all derived classes. The type of derived class and the context under which it is used will determine the type of data stored. Access to the information is dictated by the derived class. The purpose of unifying all the classes into one base class is for memory managment and indexing purposes.

    Perhaps I should make it a pointer to a QString instead?


  • Moderators

    @primem0ver said in Copy constructor question: memcpy, QString/QByteArray:

    The memory space is used for different purposes in different contexts/sistuations. It is meant to serve as a storage container for all derived classes. The type of derived class and the context under which it is used will determine the type of data stored. Access to the information is dictated by the derived class.

    When you say "derived class", do you mean deriving from FundamentalDataContainer?

    If so, what do you think of storing no data in the base class, while storing a single member variable in each derived class (using the actual type, not a union)?

    The purpose of unifying all the classes into one base class is for memory managment and indexing purposes.

    Can you please provide a small example? I still don't quite understand what you're trying to do.

    Perhaps I should make it a pointer to a QString instead?

    A pointer is a POD type, so it's quite safe to put one inside a union. However, you must now carefully manage the memory occupied by the QString object.



  • Sorry about the big delay in response there. I have some things going on in life right now that require my prioritized attention.
    @JKSH said in Copy constructor question: memcpy, QString/QByteArray:

    When you say "derived class", do you mean deriving from FundamentalDataContainer?

    Yes.

    @JKSH said in Copy constructor question: memcpy, QString/QByteArray:

    If so, what do you think of storing no data in the base class, while storing a single member variable in each derived class (using the actual type, not a union)?

    This would waste memory space since the 8 bytes required for the pointer when the unused memory space is part of the pool is all I need for identification, data, or reference info when the memory is being used for object data.

    @JKSH said in Copy constructor question: memcpy, QString/QByteArray:

    Can you please provide a small example? I still don't quite understand what you're trying to do.

    The application manages (most of) its own memory. Whether requested at once (release version), or in piecemail (debug version) from the OS, the memory is put into a pool when not in use. In this circumstance, the 8 are required for use as a pointer (to the next unused space of the same size).

    When in use, the 8 bytes used by the pointer will be used in one of three ways, depending on the derived class that makes use of the allocated space.

    It will hold an 64 bit identifier to be used by an object manager for persistent data that can be looked up
    It will hold temporary object data when used as a temporary object
    It will hold basic numeric, or string reference data when parsing XML and scripts.

    It is the last use that concerns me the most with regard to QString and QByteArray instances.

    EDIT:
    The only reason any of this is a concern at all is because in the third use case, the data must be temporarily copied (hence the copy constructor) when compiling the data into its final format. (The copy provides the basic data when it is meant as a literal, the string version of the data when the string represents a tuple, or the information necessary to look up the referenced object when it is meant as a reference).

    If this helps, here is the (unfinished) code where it is copied and the old data is used as a reference. Note that tagType() is a virtual function that is overridden by any class that makes use of this functionality

    void IParsedContentData::compile()
    {
    	ContentVariant oldData = _value; // this is the line that makes this post necessary
    	QStringList* valueStrings;
    	switch (tagType())
    	{
    	case TypeTag::boolean:
    	case TypeTag::doublePrecision:
    	case TypeTag::int32:
    	case TypeTag::int64:
    	case TypeTag::singlePrecision:
    	case TypeTag::string:
    		// do nothing.... it is already in its final format
    		break;
    	case TypeTag::doubleIntTuple:
    		Q_ASSERT(oldData.string().count(',') == 1);
    		valueStrings = &oldData.string().split(',');
    		_value.set(new TUPLE2<__int32>(valueStrings->at(0).toInt(), valueStrings->at(1).toInt()));
    		break;
    	case TypeTag::doubleFloatTuple:
    		Q_ASSERT(oldData.string().count(',') == 1);
    		valueStrings = &oldData.string().split(',');
    		_value.set(new TUPLE2<float>(valueStrings->at(0).toFloat(), valueStrings->at(1).toFloat()));
    		break;
    	case TypeTag::tripleIntTuple:
    		Q_ASSERT(oldData.string().count(',') == 2);
    		valueStrings = &oldData.string().split(',');
    		_value.set(new TUPLE3<__int32>(valueStrings->at(0).toInt(), valueStrings->at(1).toInt(), valueStrings->at(2).toInt()));
    		break;
    	case TypeTag::tripleFloatTuple:
    		Q_ASSERT(oldData.string().count(',') == 2);
    		valueStrings = &oldData.string().split(',');
    		_value.set(new TUPLE3<float>(valueStrings->at(0).toFloat(), valueStrings->at(1).toFloat(), valueStrings->at(2).toFloat()));
    		break;
    	case TypeTag::quadIntTuple:
    		Q_ASSERT(oldData.string().count(',') == 3);
    		valueStrings = &oldData.string().split(',');
    		_value.set(new TUPLE4<__int32>(valueStrings->at(0).toInt(), valueStrings->at(1).toInt(), valueStrings->at(2).toInt(), valueStrings->at(3).toInt()));
    		break;
    	case TypeTag::quadFloatTuple:
    		Q_ASSERT(oldData.string().count(',') == 3);
    		valueStrings = &oldData.string().split(',');
    		_value.set(new TUPLE4<float>(valueStrings->at(0).toFloat(), valueStrings->at(1).toFloat(), valueStrings->at(2).toFloat(), valueStrings->at(3).toFloat()));
    		break;
    		
    	case TypeTag::callbackReference:
    		// look up the callback function
    		break;
    	case TypeTag::enumReference:
    		// translate the enum
    		break;
    	case TypeTag::typeReference:
    		// look up the type in the object manager
    		break;
    	default:
    #ifdef _DEBUG
    		throw "Invalid compile method in derived class - you must override for pointer classes";
    #endif
    		break;
    	}
    	_isCompiled = true;
    }
    


  • Another helpful piece of information:

    In the above code, the "string" case is where the string is meant as a literal and should be preserved. In all other cases below that line, the string is meant to be discarded upon being compiled.



  • Both QString and QByteArray only hold pointers to their data. So all you would copy would be pointers. If you want to go that low-level, you probably need to write your own PODs to hold a byte array or string, or find them in another library.



    1. valueStrings = &oldData.string().split(','); will not work. Just use QStringList's copy constructor, Qt uses lazy copying anyway so it's super cheap. I this case it probably even triggers the move constructor so even cheaper.
    2. You are trying to reinvent std::variant. There are people way smarter than me whose job is to come up with the most efficient implementation of that class... I'd say use that, you can't go wrong.


  • @VRonin

    1. I am not sure who's not following who here. Why wouldn't it work? The QStringList is only used in special cases where the string to be parsed should contain a numeric tuple such as 2d or 3d coordinates (e.g. "3, 4, 5"). Why would I use QStringList's copy constructor for this when I want to split the contents of the qstring returned by oldData.string() in order to parse the contained values into a tuple of integer or float values?
    2. Except that QVariant takes twice as much memory to do the same job and I only need this for the aformentioned use case. I will not be holding anything more complex than a pointer or a number (or the QString data pointer as suggested by Asperamanca). My application will be doing a lot of script and XML parsing so I want to keep things as efficient and as compartmentalized as possible.

    @Asperamanca said in Copy constructor question: memcpy, QString/QByteArray:

    If you want to go that low-level, you probably need to write your own PODs to hold a byte array or string, or find them in another library.

    Actually I have considered this option (for a few reasons) and am on the fence. The problem is that I am using a LOT of functionality from the Qt library and having to constantly convert between Qt and another library (even if it is my own) would end up being rather inefficient.


  • Moderators

    @primem0ver said in Copy constructor question: memcpy, QString/QByteArray:

    1. Except that QVariant takes twice as much memory to do the same job and I only need this for the aformentioned use case. I will not be holding anything more complex than a pointer or a number (or the QString data pointer as suggested by Asperamanca). My application will be doing a lot of script and XML parsing so I want to keep things as efficient and as compartmentalized as possible.

    What @VRonin meant is not QVariant but std::variant

    Those two are very different and rhe std version is a lot more lightweight than what Qt does.



  • @J.Hilk
    Ooops... my bad. I should have payed attention to the std::



  • It looks like the std::variant may be a workable solution. I will see how it works. Thanks VRonin. Haven't really looked at the new stuff yet in C++ 17. I am still curious about your QStringList comment for the purpose of clarity.



  • @primem0ver said in Copy constructor question: memcpy, QString/QByteArray:

    I am not sure who's not following who here. Why wouldn't it work?

    &oldData.string().split(','); returns the address of a temporary item. It will go out of scope on the following line. Your debugger might keep it alive so you don't notice it but it will come back to bite you in the backside. Only const references extend the lifetime of temporary variables. change it to QStringList valueStrings; and then valueStrings = oldData.string().split(','); It will use this constructor that is just 2 pointer assignments



  • @VRonin
    The reason I used the pointer is because I didn't want to cause an unnecessary call to the default constructor. I guess it isn't that big of a deal but doesn't declaring QStringList valueStrings a priori call the default constructor and then again if the copy constructor is used? I suppose if I kept the pointer and use the copy constructor on the returned value, it would still use two constructors though... so I guess the default is more efficient. Does the compiler know not to initialize an unused instance of a variable?



  • @primem0ver said in Copy constructor question: memcpy, QString/QByteArray:

    The reason I used the pointer is because I didn't want to cause an unnecessary call to the default constructor.

    Fair enough. QStringList* valueStrings; and then valueStrings = new QStringList(oldData.string().split(',')); and then remember to call delete valueStrings; at the end. or use std::unique_ptr. Hope it's clear you can't ever store the address of a temporary variable such as a return value



  • @VRonin: Yes. Thanks

    I have changed the status back to unsolved because the std::variant has the same problem as QVariant. It uses too much memory space for a large amount of simple numeric data.


  • Moderators

    @primem0ver Back to your original question:

    • You cannot put a QString/QByteArray in a union.
    • You can create a QString/QByteArray using new and store the pointer in a union, but you must then free the memory manually.


  • @JKSH said in Copy constructor question: memcpy, QString/QByteArray:

    but you must then free the memory manually

    Which is normally a problem when using unions

    It uses too much memory space for a large amount of simple numeric data

    Can you elaborate on the order of magnitude?



  • @VRonin

    It uses too much memory space for a large amount of simple numeric data

    Can you elaborate on the order of magnitude?

    https://stackoverflow.com/questions/45575892/why-is-sizeofstdvariant-the-same-size-as-a-struct-with-the-same-members

    In practice, the OP's "simple numeric data" will be 8 bytes, plus an 8 byte overhead for the std::variant, == 16 bytes. Thus twice the size of the data he wishes to store. Whether he regards double this amount of storage as "too much memory space" is unknown. Assuming in some shape or form he wants a union plus a "flag" for the content type, he will be hard-pressed to reduce this....



  • I am making use of CRTP's, virtual methods, and static class data to make sure that the content type "flag" is not a necessary part of an instance of my variant ("union") class. This means that any class instance holding one data item will only require 8 bytes and still have the functionality of the other variant types that people have suggested. The only sacrifice is the time it takes to look up a virtual function which then calls a static function. Technically this allows me to create a custom "variant" class that only occupies 8 bytes; which is enough to hold a QString /QByteArray. Since QVariant and std:variant make use of what is essentially a union for their contents (and both allow QString/QByteArray), I am not certain why @JKSH says it cannot/should not be done.



  • I read through the responses in this thread and I don't quite understand why a class system could not be used to achieve the same thing.

    The base class could be virtual, contain no data, and only specify interaction methods with the data.
    Each derived class could contain a maximum of 8 bytes of storage for whatever data type is required. Each class could easily be verified to not exceed this constraint by a simple sizeof check. The inheritance would then be limited to a maximum of 1 depth.

    Types could be identified by an enum defined in the base class. Each class could have a method that is defined to return the enum value. You would just have to be diligent about not mixing up enum values between classes. Again these would be all defined in the base class so all classes would have access to the enums. Also, casting could be used to determine which class is being used as well.

    Each class would be responsible for defining derived methods that interact with the data. These methods could accept and return variants that are then interpreted in the class for storage of its unique data.

    Maybe some kind of explanation of the constraints might shed some light on better ways to approach this?



  • Just a note. Many containers have some overhead over and above the size of the data they store.

    So to get the size of the container is sizeof(<datatype>). However to get the size of the data stored in the container you have to be more creative:

    QVector<qint32> qvecqint32;  // a typical container holding a qint32
    
    //put some values in there
    qint32 vals[] = {0,1,2,3,4,5,6,7};
    for(auto val: vals){
            vecqint32.push_back(val);
            qvecqint32.push_back(val);
    }
    
    qInfo() << "vecqint32 sizeof:" << sizeof(qvecqint32);
    qInfo() << "vecqint32:" << vecqint32.size() * sizeof(decltype(vecqint32)::value_type);
    
    

    The output of this is:

    //your code here
    vecqint32 sizeof: 4
    vecqint32: 32
    

    So the container itself is 4 bytes with no data in it. Plus the data was added and it is 32 bytes. So 36 bytes. I have not tested, but I imagine QString and other containers may have some overhead as well.



  • @fcarney
    I don't think you have really followed the purpose here at all. Why are you bringing up QVector??? (That has nothing to do with this conversation).

    The idea is to have a basic container for basic (fundamental) data that does not exceed 8 bytes in size (the size it takes to store any of the types which have been discussed). QVariant has a size of 16 bytes; not 8. I already have a "working" class which will do this. My only concern is with the result of temporarily copying data in my custom variant class (whose goal is to only use enough space to contain the value data of a pointer, basic number, or a QString/QByteArray) in order to "compile" the data into a different format which is stored in the same space (turning the string "3, 4, 5" into a pointer to a 3D vector for example). I have given the code for that compile method in the 6th post of this thread (including the OP). Keep in mind that the only data that is passed on the stack (or as part of the "QString" class) is a pointer to the reference counted QString data.

    Since both QString and QByteArray (which in this case is being used as a container for an ANSI string) are reference counted variables, I was wondering how temporarily copying the data when it contains one of these types would affect the reference count. (Hence the title and content of the OP)

    @everyone:
    I think the bottom line is that the only way this would be an issue is if I changed the copy and then assigned it to the original. Since I am not doing that (the point is to change it into another data type entirely), I don't think the reference count will be incorrectly represented except possibly when the copy goes out of scope. Is this true?

    If it is, I can probably correct for it by copying the QString/QByteArray explicitly (since this is probably what happens in other variants when the copy constructor is used.



  • I mentioned QVector because it is a container just like QString and QByteArrary. There is overhead:

    QByteArray strdata; // empty string
    qInfo() << "sizeof:" << sizeof(strdata); // overhead of a container
    strdata = "01234567"; // 8 characters
    qInfo() << "sizeof:" << sizeof(strdata); // same overhead, but data is now stored by container
    qInfo() << "Actual data stored size:" << strdata.size() * sizeof(decltype(strdata)::value_type);
    qInfo() << "Actual overhead is:" << strdata.size() * sizeof(decltype(strdata)::value_type) + sizeof(strdata);
    

    Attempting to store 8 bytes in a QByteArrary ends up storing 12 bytes. The QVariant would only be used to return a generic value from a base class, not to store data.


  • Moderators

    @primem0ver said in Copy constructor question: memcpy, QString/QByteArray:

    Since QVariant and std:variant make use of what is essentially a union for their contents (and both allow QString/QByteArray)...

    QVariant does not store QString/QByteArray in a union. It only stores numerics and pointers in a union: https://code.woboq.org/qt5/qtbase/src/corelib/kernel/qvariant.h.html#QVariant::Private::Data More specifically, QString gets stored in a QVariant::PrivateShared structure, and a pointer to this struct gets stored in the union.

    I was wondering how temporarily copying the data when it contains one of these types would affect the reference count. (Hence the title and content of the OP)

    memcpy()-ing a QString/QByteArray will increase the number of copies but won't increase the reference count.

    I don't think the reference count will be incorrectly represented except possibly when the copy goes out of scope. Is this true?

    (If there's only 1 copy) When the QString/QByteArray destructor is called on either the copy or the original, the shared string data will be deleted -- this means the other object will become something like a dangling pointer.

    If it is, I can probably correct for it by copying the QString/QByteArray explicitly (since this is probably what happens in other variants when the copy constructor is used.

    Always use the QString/QByteArray's copy constructor. Don't use memcpy().



  • @fcarney said in Copy constructor question: memcpy, QString/QByteArray:

    Attempting to store 8 bytes in a QByteArrary ends up storing 12 bytes. The QVariant would only be used to return a generic value from a base class, not to store data.

    Right. This is all I am trying to do. I am not trying to store the actual array in the 8 bytes. For anything but a basic value that requires 8 bytes I am storing a pointer to the object. The ultimate point is to "compile" the basic data stored in a tree structure (parsed from a text file or script) to create either a "factory" template, information/data to be used in a "factory" template, or pseudocode from a script. The application I am building is a platform that will host lots of "user" created content (assuming the prototype performs as imagined). So I need to do a lot of compartmentalized generic coding in order to accomplish the tasks for the myriad of use cases that it will be applied to.

    @JKSH thanks for the clear explanation.


Log in to reply