QByteArray and char type

JonB

I was struck by something when reading @stretchthebits comment at https://forum.qt.io/topic/118343/qbytearray-range-issue/8

QByteArray seems to be just an array of char, which are signed values.

In other words, the range is from -128 to 127.

My immediate reaction was: this must be wrong, it will be an array of unsigned char. But he is correct, and it has methods like char() which return char *, and nothing native which deals with unsigned char *.

Now, I understand this is convenient because it is used to interact fairly seamlessly with char arrays and QString types. However, that is not what "byte" means, and in all other languages/toolkits which use a "byte" type that is always an unsigned 8-bit char type.

There may not be much to say about my observation, but I am surprised Qt has chosen to call a signed char type QByteArray. That seems to me a misnomer, and its usage is "unusual". It also means that I don't see any Qt type for natively handling byte/unsigned char type, you have to do casting? Which also surprises me among all the convenience types that Qt does provide.

Any comments from the Qt experts?

J.Hilk

@JonB answer from one of the maintainers:
https://bugreports.qt.io/browse/QTBUG-64746

JonB

@J-Hilk
OK, yep! Seems I am not the only one who has noticed this/commented that it is "strange". Of course I understand I can make everything work with casting/uchar/uint8_t etc., but that's not the point. Seems Qt would have been better naming it QCharArray, since that is more accurate than QByteArray, by normal naming conventions.

I guess there's no more than that to say....

I'll leave this open for a day in case anyone else wishes to comment.

aha_1980

@JonB Better would be to comment on my bugreport.

I still have hope that QByteArray one day will be ... an array of bytes.

JonB

@aha_1980
I didn't know that was you! There's nothing to say, as they're clearly never going to change existing QByteArray behaviour from char to unsigned char given how long it has been that way!

aha_1980

@JonB Indeed. Therefore I think the possible way would be to add functionality. Do you think that would be feasible?

JonB

@aha_1980
Well, given that they are not going to want to change any existing behaviour: they could introduce, say, a byte() method to correspond to the current data() method, to return unsigned char * instead of char *. But there are a lot of existing methods which accept/return something in chars not unsigned chars. In particular operator overload char operator[](int i) const (and char at(int i) const) already returns char, so in practical/convenient terms the horse has already bolted....

aha_1980

@JonB yeah, you would effectively double the API.

Another idea that came to my mind was a compatible class so you can easily convert QByteArray to QDataArray and then work on uchar. I have not investigated that much, though.

J.Hilk

@aha_1980 said in QByteArray and char type:

@JonB yeah, you would effectively double the API.

Another idea that came to my mind was a compatible class so you can easily convert QByteArray to QDataArray and then work on uchar. I have not investigated that much, though.

you would still need to touch QByteArray and add constructor & operators that accept QDataArray, no?

which would blow up the api as well

JonB

@J-Hilk , @aha_1980
Playing with QByteArrray, I now have a couple of observations/questions.

    QByteArray b;
    b.resize(1);
    b[0] = 128;
    if (b[0] >= 127)
        qDebug() << "Yes (1)";
    if (b.at(0) >= 127)
        qDebug() << "Yes (2)";

As an observation: neither of these outputs "Yes". This (or similar code) is the danger of the existing implementation being used by someone, unaware that they will not produce what is (presumably) the "expected" result, given that he assumes he is dealing with "bytes".

This produces a warning (gcc, 9.3.0) on the if (b[0] >= 127) line only:

ISO C++ says that these are ambiguous, even though the worst conversion for the first is better than the worst conversion for the second

[Don't know who wrote that message, but it's cryptic in the extreme!]

Question: why does [] produce this warning but at() does not?

J.Hilk

@JonB clang is actually a bit more detailed

main.cpp:88:18: error: use of overloaded operator '>=' is ambiguous (with operand types 'QByteRef' and 'int')
qbytearray.h:554:17: note: candidate function
main.cpp:88:18: note: built-in candidate operator>=(int, int)
main.cpp:88:18: note: built-in candidate operator>=(float, int)
main.cpp:88:18: note: built-in candidate operator>=(double, int)
main.cpp:88:18: note: built-in candidate operator>=(long double, int)
main.cpp:88:18: note: built-in candidate operator>=(int, float)
main.cpp:88:18: note: built-in candidate operator>=(int, double)
...
main.cpp:88:18: note: built-in candidate operator>=(unsigned __int128, unsigned long)
main.cpp:88:18: note: built-in candidate operator>=(unsigned __int128, unsigned long long)
main.cpp:88:18: note: built-in candidate operator>=(unsigned __int128, unsigned __int128)

and it's true, the operator overload of [] for QByteArray returns either a QByteRef or a char so, its ambiguous

where as at() is guaranteed to be of the type char

also, I get an warning for the implicit conversion so 🤷‍♂️

make this if (b[0] >= 127) explicitly a char not an implicit int, and the warning should go away

if (b[0] >= char(127))

JonB

@J-Hilk
Damn! A certain person (I'm looking at you, @mrjj :) ) told me to switch off Clang in Qt Creator and go for the editing experience without. (Partly, IIRC, because of Clang's ridiculous ordering of proposed completions for method names, which makes it awful to use.) So, like a lamb to the slaughter, I have followed his advice, and do not get that information about which overload of [] it was going for....

So, off topic, but: in view of this, would you, @J-Hilk, or others, advise me to revert to the default of Clang being on? :)

J.Hilk

@JonB said in QByteArray and char type:

So, off topic, but: in view of this, would you, @J-Hilk, or others, advise me to revert to the default of Clang being on? :)

IMHO the inclusion/support for clang as greatly improved, since its first introduction.
In the beginning, I also turned it of, and for about a year or so it's on by default (for me). And for me I had more positive experience with it then (hardly any) negative ones.

So, turn it on:D

JonB

@J-Hilk
Thanks for advice. This is all @mrjj's fault ;-)

Then I have a question (to which I suspect I already know the answer, unfortunately). I use method-name completion all the time. Without Clang on the suggestions are alphabetical, which is good to navigate. But with Clang (last time I looked, anyway) the order is "pseudo-random" ;-) Algorithm for ordering might make sense to a machine, but not to a human.... Do you not find this an issue?

J.Hilk

@JonB it still does that, I'm not sure why and you could probably make your own plugin to sort it before showing if it really bothers you :D

But usually I know the beginning of the method so I type the first 2 letters which usually is enough to narrow the selection down to a hand full of options 😉

JKSH

@JonB said in QByteArray and char type:

I am surprised Qt has chosen to call a signed char type QByteArray.

Be aware that char and signed char are different types in C++: https://stackoverflow.com/questions/436513/char-signed-char-char-unsigned-char . int is guaranteed to be signed but char is not!

For ARM CPUs, char is unsigned by default: https://developer.arm.com/documentation/dui0491/i/C-and-C---Implementation-Details/Character-sets-and-identifiers -- "The ARM ABI defines char as an unsigned byte, and this is the interpretation used by the C++ libraries supplied with the ARM compilation tools"

Seems Qt would have been better naming it QCharArray, since that is more accurate than QByteArray, by normal naming conventions.

I'd say @aha_1980's "QDataArray" name would work better than "QCharArray". To me, a "char array" is more related to historical text strings than binary data... and QByteArray is intended to be a container of binary data (i.e. bytes). I have no problems with its current name; I just treat the char as an implementation detail (albeit a leaky one)

This (or similar code) is the danger of the existing implementation being used by someone, unaware that they will not produce what is (presumably) the "expected" result, given that he assumes he is dealing with "bytes".

What is the meaning of doing an inequality comparison between a byte and a number? 127 is not a byte.

they could introduce, say, a byte() method to correspond to the current data() method, to return unsigned char * instead of char *.

I don't see much point in switching from char to unsigned char. If we're to initiate a switch, let's do things properly and switch to std::byte.

JonB

@JKSH said in QByteArray and char type:
I was not aware that char is no longer defined as signed (God bless C). Thank you for pointing that out.

What is the meaning of doing an inequality comparison between a byte and a number? 127 is not a byte.

Given the use of the word "byte" in QByteArray, I am (arrogantly) confident that since

    QByteArray b;
    b.resize(1);
    b[0] = 128;
    if (b.at(0) >= 127)
        qDebug() << "Yes";

goes through gcc without warning and does not produce "Yes" it will catch people out, if I could look through a whole bunch of people's code.... It's an observation. In part inspired from the confusion shown in the https://forum.qt.io/topic/118343/qbytearray-range-issue thread.

J.Hilk

@JKSH said in QByteArray and char type:

I don't see much point in switching from char to unsigned char. If we're to initiate a switch, let's do things properly and switch to std::byte.

is someone(tm) where to make the changes, like @aha_1980 suggested in this bug report: https://bugreports.qt.io/browse/QTBUG-64746

you would prefer std::byte over unsigned char ?

Because Thiago was against scope creep, and adding std::byte and unsigned char probably falls in that category

that said, std::byte would make that Qt Version require c++17 or later. I'm not sure, that's ok, or not ?

aha_1980

Good morning @J-Hilk,

that said, std::byte would make that Qt Version require c++17 or later. I'm not sure, that's ok, or not ?

That is no problem, as Qt 6 requires C++17.

But as std::byte is also with limited scope (no arithmetic) I'm not sure it is a general solution...

J.Hilk

@aha_1980 good morning to you too!

That is no problem, as Qt 6 requires C++17.

where did you get that from ? I spend like 30 min searching for any reference and didn't find anything :(

Discover and share your #QtStories

Upcoming Forum Update April 22nd

Solved QByteArray and char type