QByteArray and char type

JKSH

@aha_1980 said in QByteArray and char type:

@J-Hilk said in QByteArray and char type:

I'm not a source code contributor (yet :) )

I don't think that's a precondition. You contribute on many other places.

And you have a good knowledge about the library and a vision where it should go to.

And that's what counts :)

+1 @J-Hilk is definitely a Contributor to the Qt community.

JonB

@JKSH said in QByteArray and char type:

I agree. And I think programmers shouldn't try to do arithmetic on QByteArray elements either.

That's fine if I receive some QByteArray data and just want to store it/forward it onto something else. It's not fine if I need to look at its content and act on it for some purpose. Then I may need to, say, see if it's greater than 200 or whatever. At which point I think I need to cast away from std::byte() to achieve that.

Wasn't your original point of this thread that a "blob" of data should be unsigned char?

I was not the person who introduced the discussion about representing it via std::byte, for good or for bad! I want to be able to examine the bytes and do, for example, greater-then operations on them. For that, my original point was that I did not expect something referring to "bytes" --- using at least what I have found usage of that word in other languages to be, viz. an unsigned quantity in range 0--255 --- to have an interface only offering (signed) chars, I expected unsigned chars to be available. Else one must be careful about comparison code, for instance.

J.Hilk

Let me bring even more confusion in this and point to Timur Doumler excellent talk at CppCon 2019 about type punning, where he outlines that this:

void printBitRepresentation(float f)
{
    auto buf* = reinterpret_cast<unsigned char*>(&f);
         for( int i(0); i < sizeof(float); i++ ) {
                std::cout << buf[i];
         }
}

is actually undefined behavior.
https://youtu.be/_qzMpk-22cc?t=2626

@JKSH thanks :D

JonB

@J-Hilk
I did have a look at that (frightening) discussion. I was "perturbed" by the answer that you have to rely on what he said was a "magic" implementation of memcpy(), which you can't know anything about, to achieve it! And didn't really understand how that resolves whatever the issue is anyway.

J.Hilk

@JonB well its a wording defect, I'm pretty sure all compilers behave the same here. It's just not explicitly defined 🤷‍♂️

JonB

@J-Hilk
I still didn't understand how using memcpy() between addresses (void * received by memcpy()) resolved the problem, as opposed to just moving it elsewhere. Perhaps I would have had to read the whitepaper he showed if I wanted to understand. Unless you feel like explaining why memcpy() from one address to another, and then back in code accessing the destination address as an unsigned char * but not so for the source address, would make it "work correctly"...?

J.Hilk

@JonB well as I understand it:

reinterpret_cast does not change the pointer. You previously pointed to the float object, and after the reinterpret_cast you still do. And now you want to do pointer arithmetic on that object that is undefined behavior.

Now with memcpy you actually copy the bytes from one pointer to an other. How thats done, only the compiler vendor knows :D but after the copy have defined behavior, because the char array is actually there!

But it makes no difference
take a look at this compiler explorer output

https://gcc.godbolt.org/z/7673av

the 2 functions produce identical assembler code

JonB

@J-Hilk
I do realize in practice the code generation is OK. Not my point.

memcpy() takes void *src and a void *dest. It doesn't know what they point to. It copies a number of bytes from one area to the other. Now afterward back in your code you are allowed to access/array the bytes at dest *, yet not a src *. Makes no sense to me....

kshegunov

@JonB said in QByteArray and char type:

I still didn't understand how using memcpy() between addresses (void * received by memcpy()) resolved the problem, as opposed to just moving it elsewhere.

Technically it does because black magic™. You have that kind of nonsense sprinkled all around the standard, just doesn't get too much exposure. To give you an example through a simple question:

What's the actual type of a lambda function?

Or to expand:
That is how does one define that a function is going to take a lambda as parameter?

Conventional wisdom is use the STL (std::function). The ideological problem is that the latter is a template which needs to have a specified type as a template parameter, however a lambda has an undefined type, so the instantiation happens with the magic ClosureType, which is implementation defined.

Here's how the Callable magic works:
https://en.cppreference.com/w/cpp/named_req/Callable
Basically you define a Callable anything that can be used through the STL's related types, but then the STL types require the template argument to be callable to make the instantiation - so it boils down to compiler incantations. (I'm not talking about the way compilers implement this though, just the ideas and the wording).

PS. As a side note the lambdas are inlined extremely aggressively by the compiler. In release you don't get even a notion of such a construct.

JKSH

@JonB said in QByteArray and char type:

It's not fine if I need to look at its content and act on it for some purpose. Then I may need to, say, see if it's greater than 200 or whatever.

...

I want to be able to examine the bytes and do, for example, greater-then operations on them

...

I expected unsigned chars to be available. Else one must be careful about comparison code, for instance.

I think we have divergent ideas on what a byte is and what we expect of them. May I ask,

What is your detailed definition of a byte?
Can you provide a concrete example where you'd want to check that a byte is greater than 200 or whatever? (And I mean a byte, not a number, not an ASCII character)
Does unsigned char fit your definition in #1?
Does std::byte fit your definition in #1?

JKSH

@J-Hilk said in QByteArray and char type:

Let me bring even more confusion in this and point to Timur Doumler excellent talk at CppCon 2019 about type punning, where he outlines that this:
void printBitRepresentation(float f)
{
    auto buf* = reinterpret_cast<unsigned char*>(&f);
         for( int i(0); i < sizeof(float); i++ ) {
                std::cout << buf[i];
         }
}
is actually undefined behavior.
https://youtu.be/_qzMpk-22cc?t=2626

Wow, that's wild.

The same kind of thing happens in law -- hence why lawyers have job security!

JonB

@JKSH
We'll have to be careful. I realize this discussion will get out of hand, you know more than I do about correct definitions.

What is your detailed definition of a byte?

About twice a "nibble" ;-) Also, if I get a mosquito nibble it doesn't hurt so much, but if I get a mosquito byte it really itches.

In a nutshell, I see for example in Python

Return a new "bytes" object, which is an immutable sequence of small integers in the range 0 <= x < 256

Wikipedia:

The modern de facto standard of eight bits, as documented in ISO/IEC 2382-1:1993, is a convenient power of two permitting the binary-encoded values 0 through 255 for one byte

Assuming 8-bits to keep it simple, I have always taken "byte" as meaning an unsigned quantity 0--255, as opposed to a signed one, -128--127. That is the nub. It's just that's how I see it used elsewhere.

Can you provide a concrete example where you'd want to check that a byte is greater than 200 or whatever? (And I mean a byte, not a number, not an ASCII character)

Nope, nothing practical :) I have an imaginary piece of hardware sending me a stream of byte values. For whatever reason (the joystick is faulty in one direction), I wish to ignore the ones larger than 200. I don't want to worry about casting/sign extension. QByteArray b; if (b.at(0) > 200) ....

Does unsigned char fit your definition in #1?

Yep. And I don't have to worry about sign!

Does std::byte fit your definition in #1?

It does when I don't look at the content. It's a bit useless when I do want to look at it (as I have to cast all over the place), So all in all it turns out it's a bit like a quantum object :)

Do you think in common parlance that a "byte" implies to you a value between 0--255 (just assume 8-bit). Perhaps it just as much suggests -128--127 to you?

kshegunov

@JonB said in QByteArray and char type:

Do you think in common parlance that a "byte" implies to you a value between 0--255 (just assume 8-bit). Perhaps it just as much suggests -128--127 to you?

Byte doesn't imply a value per se, it's a storage unit. Same if you talk about a Word, depending on your architecture a word may be of a different size (usually one defines the word through the register's width). The punchline is that we've used these terms so interchangeably through the years for integers of specific width that it became ubiquitous to equate them, hence they defined the qbit (albeit it's still regular a bit) for the quantum bit.

PS. If you're wondering: from information theory a bit is the atom (in the sense of being the smallest distinguishable indivisible piece) of information.

fcarney

I might be old, but I don't understand how an 8 bit char is not a byte. The definition of byte is that it is 8 bits. Is there some new definition of byte that somehow excludes char or signed 8 bits?

stretchthebits

@fcarney
Yes, a byte = 8 bit
The problem is, are you going to treat that as unsigned char or signed char. Because, if you are going to be performing mathematical operation on them, the sign matters. if it is just text, it does not matter.

KroMignon

@JonB said in QByteArray and char type:

Nope, nothing practical :) I have an imaginary piece of hardware sending me a stream of byte values. For whatever reason (the joystick is faulty in one direction), I wish to ignore the ones larger than 200. I don't want to worry about casting/sign extension. QByteArray b; if (b.at(0) > 200) ....

This is wrong (as QByteArray::at() will return a signed value)

QByteArray b = <something>;
if (b.at(0) > 200) ....

This is the right way to do:

QByteArray b = <something>;
if (quint8(b.at(0)) > 200) ....

Just my 2 cts,

kshegunov

@stretchthebits said in QByteArray and char type:

The problem is, are you going to treat that as unsigned char or signed char. Because, if you are going to be performing mathematical operation on them, the sign matters. if it is just text, it does not matter.

Or as I'd said:

Byte doesn't imply a value per se, it's a storage unit.

Take 4 consecutive bytes in memory, does that imply a value between 2^-32 to 2^32 - 1? Surely not, you can have at least several separate interpretations off the top of my head (packed struct assumed):
int, unsigned int, struct { short a, b; }, char x[4] and so on. All of this is four bytes and it's the same for the single byte, the interpretation is not tied to actual storage, strictly speaking.

@KroMignon said in QByteArray and char type:

This is the right way to do:
QByteArray b = <something>;
if (quint8(b.at(0)) > 200) ....

I suggest:

if (quint8(b.at(0)) > quint8(200))

so you don't get the value promoted to int for no good reason.

KroMignon

@kshegunov said in QByteArray and char type:

I suggest:
if (quint8(b.at(0)) > quint8(200))

so you don't get the value promoted to int for no good reason.

I don't see a issue with if (quint8(b.at(0)) > 200), but if (b.at(0) > 200) is wrong and will never work.

fcarney

@stretchthebits said in QByteArray and char type:

The problem is, are you going to treat that as unsigned char or signed char.

I am going to treat it as whatever storage type I need. I will cast it to what is needed for that particular piece of code. Is this discussion about having to cast the pointer? I do casting all the time from base objects to derived types. How is this any different? I am not even promoting the type. Just saying its unsigned char* now. Why is this an issue?

kshegunov

@KroMignon said in QByteArray and char type:

I don't see a issue with if (quint8(b.at(0)) > 200), but if (b.at(0) > 200) is wrong and will never work.

It will work, of course, and the compiler is smart enough to optimize it out it appears. In C/C++ this return value should've been promoted to int as 200 is an int literal, but I didn't take into account that the ax registers are already integers, so this is going to be pruned when optimizing. Note the finer details here: https://godbolt.org/z/6hb8bv