Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

QByteArray and char type



  • I was struck by something when reading @stretchthebits comment at https://forum.qt.io/topic/118343/qbytearray-range-issue/8

    QByteArray seems to be just an array of char, which are signed values.

    In other words, the range is from -128 to 127.

    My immediate reaction was: this must be wrong, it will be an array of unsigned char. But he is correct, and it has methods like char() which return char *, and nothing native which deals with unsigned char *.

    Now, I understand this is convenient because it is used to interact fairly seamlessly with char arrays and QString types. However, that is not what "byte" means, and in all other languages/toolkits which use a "byte" type that is always an unsigned 8-bit char type.

    There may not be much to say about my observation, but I am surprised Qt has chosen to call a signed char type QByteArray. That seems to me a misnomer, and its usage is "unusual". It also means that I don't see any Qt type for natively handling byte/unsigned char type, you have to do casting? Which also surprises me among all the convenience types that Qt does provide.

    Any comments from the Qt experts?


  • Moderators

    @JonB answer from one of the maintainers:
    https://bugreports.qt.io/browse/QTBUG-64746



  • @J-Hilk
    OK, yep! Seems I am not the only one who has noticed this/commented that it is "strange". Of course I understand I can make everything work with casting/uchar/uint8_t etc., but that's not the point. Seems Qt would have been better naming it QCharArray, since that is more accurate than QByteArray, by normal naming conventions.

    I guess there's no more than that to say....

    I'll leave this open for a day in case anyone else wishes to comment.


  • Lifetime Qt Champion

    @JonB Better would be to comment on my bugreport.

    I still have hope that QByteArray one day will be ... an array of bytes.



  • @aha_1980
    I didn't know that was you! There's nothing to say, as they're clearly never going to change existing QByteArray behaviour from char to unsigned char given how long it has been that way!


  • Lifetime Qt Champion

    @JonB Indeed. Therefore I think the possible way would be to add functionality. Do you think that would be feasible?



  • @aha_1980
    Well, given that they are not going to want to change any existing behaviour: they could introduce, say, a byte() method to correspond to the current data() method, to return unsigned char * instead of char *. But there are a lot of existing methods which accept/return something in chars not unsigned chars. In particular operator overload char operator[](int i) const (and char at(int i) const) already returns char, so in practical/convenient terms the horse has already bolted....


  • Lifetime Qt Champion

    @JonB yeah, you would effectively double the API.

    Another idea that came to my mind was a compatible class so you can easily convert QByteArray to QDataArray and then work on uchar. I have not investigated that much, though.


  • Moderators

    @aha_1980 said in QByteArray and char type:

    @JonB yeah, you would effectively double the API.

    Another idea that came to my mind was a compatible class so you can easily convert QByteArray to QDataArray and then work on uchar. I have not investigated that much, though.

    you would still need to touch QByteArray and add constructor & operators that accept QDataArray, no?

    which would blow up the api as well



  • @J-Hilk , @aha_1980
    Playing with QByteArrray, I now have a couple of observations/questions.

        QByteArray b;
        b.resize(1);
        b[0] = 128;
        if (b[0] >= 127)
            qDebug() << "Yes (1)";
        if (b.at(0) >= 127)
            qDebug() << "Yes (2)";
    

    As an observation: neither of these outputs "Yes". This (or similar code) is the danger of the existing implementation being used by someone, unaware that they will not produce what is (presumably) the "expected" result, given that he assumes he is dealing with "bytes".

    This produces a warning (gcc, 9.3.0) on the if (b[0] >= 127) line only:

    ISO C++ says that these are ambiguous, even though the worst conversion for the first is better than the worst conversion for the second

    [Don't know who wrote that message, but it's cryptic in the extreme!]

    Question: why does [] produce this warning but at() does not?


  • Moderators

    @JonB clang is actually a bit more detailed

    main.cpp:88:18: error: use of overloaded operator '>=' is ambiguous (with operand types 'QByteRef' and 'int')
    qbytearray.h:554:17: note: candidate function
    main.cpp:88:18: note: built-in candidate operator>=(int, int)
    main.cpp:88:18: note: built-in candidate operator>=(float, int)
    main.cpp:88:18: note: built-in candidate operator>=(double, int)
    main.cpp:88:18: note: built-in candidate operator>=(long double, int)
    main.cpp:88:18: note: built-in candidate operator>=(int, float)
    main.cpp:88:18: note: built-in candidate operator>=(int, double)
    ...
    main.cpp:88:18: note: built-in candidate operator>=(unsigned __int128, unsigned long)
    main.cpp:88:18: note: built-in candidate operator>=(unsigned __int128, unsigned long long)
    main.cpp:88:18: note: built-in candidate operator>=(unsigned __int128, unsigned __int128)
    
    

    and it's true, the operator overload of [] for QByteArray returns either a QByteRef or a char so, its ambiguous

    where as at() is guaranteed to be of the type char

    also, I get an warning for the implicit conversion so 🤷‍♂️

    b3269447-0676-411e-9538-de07c9e18014-image.png

    make this if (b[0] >= 127) explicitly a char not an implicit int, and the warning should go away

    if (b[0] >= char(127))
    


  • @J-Hilk
    Damn! A certain person (I'm looking at you, @mrjj :) ) told me to switch off Clang in Qt Creator and go for the editing experience without. (Partly, IIRC, because of Clang's ridiculous ordering of proposed completions for method names, which makes it awful to use.) So, like a lamb to the slaughter, I have followed his advice, and do not get that information about which overload of [] it was going for....

    So, off topic, but: in view of this, would you, @J-Hilk, or others, advise me to revert to the default of Clang being on? :)


  • Moderators

    @JonB said in QByteArray and char type:

    So, off topic, but: in view of this, would you, @J-Hilk, or others, advise me to revert to the default of Clang being on? :)

    IMHO the inclusion/support for clang as greatly improved, since its first introduction.
    In the beginning, I also turned it of, and for about a year or so it's on by default (for me). And for me I had more positive experience with it then (hardly any) negative ones.

    So, turn it on:D



  • @J-Hilk
    Thanks for advice. This is all @mrjj's fault ;-)

    Then I have a question (to which I suspect I already know the answer, unfortunately). I use method-name completion all the time. Without Clang on the suggestions are alphabetical, which is good to navigate. But with Clang (last time I looked, anyway) the order is "pseudo-random" ;-) Algorithm for ordering might make sense to a machine, but not to a human.... Do you not find this an issue?


  • Moderators

    @JonB it still does that, I'm not sure why and you could probably make your own plugin to sort it before showing if it really bothers you :D

    But usually I know the beginning of the method so I type the first 2 letters which usually is enough to narrow the selection down to a hand full of options 😉


  • Moderators

    @JonB said in QByteArray and char type:

    I am surprised Qt has chosen to call a signed char type QByteArray.

    Be aware that char and signed char are different types in C++: https://stackoverflow.com/questions/436513/char-signed-char-char-unsigned-char . int is guaranteed to be signed but char is not!

    For ARM CPUs, char is unsigned by default: https://developer.arm.com/documentation/dui0491/i/C-and-C---Implementation-Details/Character-sets-and-identifiers -- "The ARM ABI defines char as an unsigned byte, and this is the interpretation used by the C++ libraries supplied with the ARM compilation tools"

    Seems Qt would have been better naming it QCharArray, since that is more accurate than QByteArray, by normal naming conventions.

    I'd say @aha_1980's "QDataArray" name would work better than "QCharArray". To me, a "char array" is more related to historical text strings than binary data... and QByteArray is intended to be a container of binary data (i.e. bytes). I have no problems with its current name; I just treat the char as an implementation detail (albeit a leaky one)

    This (or similar code) is the danger of the existing implementation being used by someone, unaware that they will not produce what is (presumably) the "expected" result, given that he assumes he is dealing with "bytes".

    What is the meaning of doing an inequality comparison between a byte and a number? 127 is not a byte.

    they could introduce, say, a byte() method to correspond to the current data() method, to return unsigned char * instead of char *.

    I don't see much point in switching from char to unsigned char. If we're to initiate a switch, let's do things properly and switch to std::byte.



  • @JKSH said in QByteArray and char type:
    I was not aware that char is no longer defined as signed (God bless C). Thank you for pointing that out.

    What is the meaning of doing an inequality comparison between a byte and a number? 127 is not a byte.

    Given the use of the word "byte" in QByteArray, I am (arrogantly) confident that since

        QByteArray b;
        b.resize(1);
        b[0] = 128;
        if (b.at(0) >= 127)
            qDebug() << "Yes";
    

    goes through gcc without warning and does not produce "Yes" it will catch people out, if I could look through a whole bunch of people's code.... It's an observation. In part inspired from the confusion shown in the https://forum.qt.io/topic/118343/qbytearray-range-issue thread.


  • Moderators

    @JKSH said in QByteArray and char type:

    I don't see much point in switching from char to unsigned char. If we're to initiate a switch, let's do things properly and switch to std::byte.

    is someone(tm) where to make the changes, like @aha_1980 suggested in this bug report: https://bugreports.qt.io/browse/QTBUG-64746

    you would prefer std::byte over unsigned char ?

    Because Thiago was against scope creep, and adding std::byte and unsigned char probably falls in that category


    that said, std::byte would make that Qt Version require c++17 or later. I'm not sure, that's ok, or not ?


  • Lifetime Qt Champion

    Good morning @J-Hilk,

    that said, std::byte would make that Qt Version require c++17 or later. I'm not sure, that's ok, or not ?

    That is no problem, as Qt 6 requires C++17.

    But as std::byte is also with limited scope (no arithmetic) I'm not sure it is a general solution...


  • Moderators

    @aha_1980 good morning to you too!

    That is no problem, as Qt 6 requires C++17.

    where did you get that from ? I spend like 30 min searching for any reference and didn't find anything :(


  • Lifetime Qt Champion


  • Moderators

    @J-Hilk said in QByteArray and char type:

    @JKSH said in QByteArray and char type:

    I don't see much point in switching from char to unsigned char. If we're to initiate a switch, let's do things properly and switch to std::byte.

    is someone(tm) where to make the changes, like @aha_1980 suggested in this bug report: https://bugreports.qt.io/browse/QTBUG-64746

    you would prefer std::byte over unsigned char ?

    Actually, I take that back. I just tried playing std::byte and found that it's not easy to work with:

    std::byte b = 0xFF; // Error: cannot initialize a variable of type 'std::byte' with an rvalue of type 'int'
    
    auto x = std::byte{0xFF};
    auto y = uchar{0xFF};
    qDebug() << (x == y); // Error: Invalid operands to binary expression
    

    We also can't pass std::byte to a function that expects unsigned char without casting, so it isn't any more interoperable than the existing char.

    Because Thiago was against scope creep, and adding std::byte and unsigned char probably falls in that category

    I was originally thinking of adding functions that operate on std::byte and omitting functions that operate on unsigned char. I'm no longer convinced that's helpful.

    std::byte would make that Qt Version require c++17 or later. I'm not sure, that's ok, or not ?

    As @aha_1980 pointed out, this part isn't an issue.

    The bigger issue is reaching a consensus on how far we should go:

    @aha_1980 said in QByteArray and char type:

    Hi @J-Hilk ,

    we've missed you at 2019 Contributers summit ;)

    https://wiki.qt.io/Qt_Contributors_Summit_2019_Program#C.2B.2B17_language_and_std_library_features_for_Qt_6

    There's also the blog post at https://www.qt.io/blog/first-qt-6.0-snapshot-available

    P.S. Anyone signed up for the virtual Qt World Summit? :-D


  • Moderators

    @JKSH alright, lets see if that "someone(TM)" ends up being me. Have to contribute eventually 🤷‍♂️

    and yes I bought a ticket already ;)

    @aha_1980

    we've missed you at 2019 Contributers summit ;)

    I'm not a source code contributor (yet :) )



  • @JKSH said in QByteArray and char type:

    Actually, I take that back. I just tried playing std::byte and found that it's not easy to work with:

    You guys know more about C++ than I, but my reading of std::byte() is that it is effectively just a representation of an 8-bit pattern. You are not supposed to do any arithmetic on it, or natively compare it to unsigned char etc. It's just a "blob" of data. ?


  • Moderators

    @JonB said in QByteArray and char type:

    my reading of std::byte() is that it is effectively just a representation of an 8-bit pattern.

    I agree.

    (Caveat: A byte is defined as the smallest accesible unit of data in memory. It's usually 8-bits in today's common architectures, but it doesn't actually have to be 8-bits)

    You are not supposed to do any arithmetic on it

    I agree. And I think programmers shouldn't normally try to do arithmetic on QByteArray elements either. (Exception: If you have a low-level efficiency hack in mind, you really know what you're doing, and you document it clearly, then go ahead)

    ...or natively compare it to unsigned char etc. It's just a "blob" of data. ?

    Wasn't your original point of this thread that a "blob" of data should be unsigned char?


  • Lifetime Qt Champion

    @J-Hilk said in QByteArray and char type:

    I'm not a source code contributor (yet :) )

    I don't think that's a precondition. You contribute on many other places.

    And you have a good knowledge about the library and a vision where it should go to.

    And that's what counts :)

    Regards


  • Moderators

    @aha_1980 said in QByteArray and char type:

    @J-Hilk said in QByteArray and char type:

    I'm not a source code contributor (yet :) )

    I don't think that's a precondition. You contribute on many other places.

    And you have a good knowledge about the library and a vision where it should go to.

    And that's what counts :)

    +1 @J-Hilk is definitely a Contributor to the Qt community.



  • @JKSH said in QByteArray and char type:

    I agree. And I think programmers shouldn't try to do arithmetic on QByteArray elements either.

    That's fine if I receive some QByteArray data and just want to store it/forward it onto something else. It's not fine if I need to look at its content and act on it for some purpose. Then I may need to, say, see if it's greater than 200 or whatever. At which point I think I need to cast away from std::byte() to achieve that.

    Wasn't your original point of this thread that a "blob" of data should be unsigned char?

    I was not the person who introduced the discussion about representing it via std::byte, for good or for bad! I want to be able to examine the bytes and do, for example, greater-then operations on them. For that, my original point was that I did not expect something referring to "bytes" --- using at least what I have found usage of that word in other languages to be, viz. an unsigned quantity in range 0--255 --- to have an interface only offering (signed) chars, I expected unsigned chars to be available. Else one must be careful about comparison code, for instance.


  • Moderators

    Let me bring even more confusion in this and point to Timur Doumler excellent talk at CppCon 2019 about type punning, where he outlines that this:

    void printBitRepresentation(float f)
    {
        auto buf* = reinterpret_cast<unsigned char*>(&f);
             for( int i(0); i < sizeof(float); i++ ) {
                    std::cout << buf[i];
             }
    }
    
    

    is actually undefined behavior.
    https://youtu.be/_qzMpk-22cc?t=2626

    @JKSH thanks :D



  • @J-Hilk
    I did have a look at that (frightening) discussion. I was "perturbed" by the answer that you have to rely on what he said was a "magic" implementation of memcpy(), which you can't know anything about, to achieve it! And didn't really understand how that resolves whatever the issue is anyway.


  • Moderators

    @JonB well its a wording defect, I'm pretty sure all compilers behave the same here. It's just not explicitly defined 🤷‍♂️



  • @J-Hilk
    I still didn't understand how using memcpy() between addresses (void * received by memcpy()) resolved the problem, as opposed to just moving it elsewhere. Perhaps I would have had to read the whitepaper he showed if I wanted to understand. Unless you feel like explaining why memcpy() from one address to another, and then back in code accessing the destination address as an unsigned char * but not so for the source address, would make it "work correctly"...?


  • Moderators

    @JonB well as I understand it:

    reinterpret_cast does not change the pointer. You previously pointed to the float object, and after the reinterpret_cast you still do. And now you want to do pointer arithmetic on that object that is undefined behavior.

    Now with memcpy you actually copy the bytes from one pointer to an other. How thats done, only the compiler vendor knows :D but after the copy have defined behavior, because the char array is actually there!

    But it makes no difference
    take a look at this compiler explorer output

    https://gcc.godbolt.org/z/7673av

    the 2 functions produce identical assembler code



  • @J-Hilk
    I do realize in practice the code generation is OK. Not my point.

    memcpy() takes void *src and a void *dest. It doesn't know what they point to. It copies a number of bytes from one area to the other. Now afterward back in your code you are allowed to access/array the bytes at dest *, yet not a src *. Makes no sense to me....


  • Qt Champions 2017

    @JonB said in QByteArray and char type:

    I still didn't understand how using memcpy() between addresses (void * received by memcpy()) resolved the problem, as opposed to just moving it elsewhere.

    Technically it does because black magic™. You have that kind of nonsense sprinkled all around the standard, just doesn't get too much exposure. To give you an example through a simple question:

    What's the actual type of a lambda function?

    Or to expand:
    That is how does one define that a function is going to take a lambda as parameter?

    Conventional wisdom is use the STL (std::function). The ideological problem is that the latter is a template which needs to have a specified type as a template parameter, however a lambda has an undefined type, so the instantiation happens with the magic ClosureType, which is implementation defined.

    Here's how the Callable magic works:
    https://en.cppreference.com/w/cpp/named_req/Callable
    Basically you define a Callable anything that can be used through the STL's related types, but then the STL types require the template argument to be callable to make the instantiation - so it boils down to compiler incantations. (I'm not talking about the way compilers implement this though, just the ideas and the wording).

    PS. As a side note the lambdas are inlined extremely aggressively by the compiler. In release you don't get even a notion of such a construct.


  • Moderators

    @JonB said in QByteArray and char type:

    It's not fine if I need to look at its content and act on it for some purpose. Then I may need to, say, see if it's greater than 200 or whatever.

    ...

    I want to be able to examine the bytes and do, for example, greater-then operations on them

    ...

    I expected unsigned chars to be available. Else one must be careful about comparison code, for instance.

    I think we have divergent ideas on what a byte is and what we expect of them. May I ask,

    1. What is your detailed definition of a byte?
    2. Can you provide a concrete example where you'd want to check that a byte is greater than 200 or whatever? (And I mean a byte, not a number, not an ASCII character)
    3. Does unsigned char fit your definition in #1?
    4. Does std::byte fit your definition in #1?

  • Moderators

    @J-Hilk said in QByteArray and char type:

    Let me bring even more confusion in this and point to Timur Doumler excellent talk at CppCon 2019 about type punning, where he outlines that this:

    void printBitRepresentation(float f)
    {
        auto buf* = reinterpret_cast<unsigned char*>(&f);
             for( int i(0); i < sizeof(float); i++ ) {
                    std::cout << buf[i];
             }
    }
    
    

    is actually undefined behavior.
    https://youtu.be/_qzMpk-22cc?t=2626

    Wow, that's wild.

    The same kind of thing happens in law -- hence why lawyers have job security!



  • @JKSH
    We'll have to be careful. I realize this discussion will get out of hand, you know more than I do about correct definitions.

    What is your detailed definition of a byte?

    About twice a "nibble" ;-) Also, if I get a mosquito nibble it doesn't hurt so much, but if I get a mosquito byte it really itches.

    In a nutshell, I see for example in Python

    Return a new "bytes" object, which is an immutable sequence of small integers in the range 0 <= x < 256

    Wikipedia:

    The modern de facto standard of eight bits, as documented in ISO/IEC 2382-1:1993, is a convenient power of two permitting the binary-encoded values 0 through 255 for one byte

    Assuming 8-bits to keep it simple, I have always taken "byte" as meaning an unsigned quantity 0--255, as opposed to a signed one, -128--127. That is the nub. It's just that's how I see it used elsewhere.

    Can you provide a concrete example where you'd want to check that a byte is greater than 200 or whatever? (And I mean a byte, not a number, not an ASCII character)

    Nope, nothing practical :) I have an imaginary piece of hardware sending me a stream of byte values. For whatever reason (the joystick is faulty in one direction), I wish to ignore the ones larger than 200. I don't want to worry about casting/sign extension. QByteArray b; if (b.at(0) > 200) ....

    Does unsigned char fit your definition in #1?

    Yep. And I don't have to worry about sign!

    Does std::byte fit your definition in #1?

    It does when I don't look at the content. It's a bit useless when I do want to look at it (as I have to cast all over the place), So all in all it turns out it's a bit like a quantum object :)

    Do you think in common parlance that a "byte" implies to you a value between 0--255 (just assume 8-bit). Perhaps it just as much suggests -128--127 to you?


  • Qt Champions 2017

    @JonB said in QByteArray and char type:

    Do you think in common parlance that a "byte" implies to you a value between 0--255 (just assume 8-bit). Perhaps it just as much suggests -128--127 to you?

    Byte doesn't imply a value per se, it's a storage unit. Same if you talk about a Word, depending on your architecture a word may be of a different size (usually one defines the word through the register's width). The punchline is that we've used these terms so interchangeably through the years for integers of specific width that it became ubiquitous to equate them, hence they defined the qbit (albeit it's still regular a bit) for the quantum bit.

    PS. If you're wondering: from information theory a bit is the atom (in the sense of being the smallest distinguishable indivisible piece) of information.



  • I might be old, but I don't understand how an 8 bit char is not a byte. The definition of byte is that it is 8 bits. Is there some new definition of byte that somehow excludes char or signed 8 bits?


Log in to reply