readRawData(char *s, int len) build-in function doesn't read char *buff of the len size and doesn't return number of data read

ainu · 2 Nov 2017, 21:00

Hi, my question concern the use of readRawData( ... ) function, I tried to read 3 files of different size ( 11 Kb, 5 Mb and 2 Gb ) with readRawData( char *buffer, int size ).
size was set to 460 byte, memory for buffer was allocated with calloc. ( i understand that it is better to avoid using C in C++ code it will be changed a bit later ). Files of size 5 Mb and 2Gb were filled with random values, 11 Kb was normal text file.
While reading 11 Kb file buffer was filled with amount of data equal to the size value. In case of reading 5 Mb and 2 Gb files buffer was filled with random amount of data scarcely equal to size value. Moreover in all cases readRawData retrun 0 ( not number of data read ) although after calling to readRawData buffer was filled with appropriate data ( i could check only the case of 11 Kb file ). The code i am using to read file

size_t chunk_size = confFile->getDatagramSize() - DATAGRAM_HEADER;
char *temp;  // chunk without header and pos in chunk sequence
int size = chunk_size;
QDataStream in( &file );
while( !file.atEnd() ) {
            if ( msgN <= file.size() / chunk_size ) {
                temp = ( char * )calloc( chunk_size + 1, sizeof( char ) );

                size = chunk_size;
            } else {
                temp = ( char * )calloc( ( file.size() % chunk_size ) + 1, sizeof( char ) );

                size = file.size() % chunk_size;
            }

            int dataRead = 0;
            if ( dataRead = in.readRawData( temp, size ) < 0 )    
                qDebug() << "No datum were read";

            temp[ size + 1 ] = '\0';
            filMsgToSnd( fileName, msgN );
            setMsg( QString( temp ) );

            processData->setSndMsgsSq( getMsg() );
            ++msgN;    // represent chunk pos in seq
            free( temp );
            temp = NULL;

            msg.clear();
        }

Why this could happen?
Would appreciate any help or advice.
Thank you in advance.

JonB · 3 Nov 2017, 05:57

@ainu
if ( dataRead = in.readRawData( temp, size ) < 0 )

Oh dear, oh dear! Don't you mean:

if ( ( dataRead = in.readRawData( temp, size ) ) < 0 )

Not to mention that you'll want to actually reference dataRead after you've got it.... (e.g. temp[ dataRead + 1 ] = '\0'; )? Plus I think conceptually you are assuming dataRead == size after the read, which I imagine will not be guaranteed at all.

Files of size 5 Mb and 2Gb were filled with random values

As in random bytes, or text only?

ainu · 11 Mar 2017, 06:52

@JNBarchan You answer lead to me to depressed mood as i realised i forgot C++ Operator Precedence. Thank you for detailed answer.
As fore referencing readData, no i didn't want to do this as i add this variable just for debugging to see that data read by readRawData() function equal to the actual number of data i was intended to read with this function ( i.e. the "size" variable ). The thing is that when i am using readRawData function it fills temp with variable amount of data which rarely depend on the size parameter. That was the actual reason to write here and the main question ( dataRead was added as i wrote above only for debugging, regretfuly ).

ainu · 3 Nov 2017, 08:36

@JNBarchan

Files of size 5 Mb and 2Gb were filled with random values

As in random bytes, or text only?

Yeah, random bytes. Do you think this is the reason, I also thought that it could be the point but as function readRawData( ... ) is designed for reading raw binary data i tried to think about other reasons but what do you think?

JonB · 4 Nov 2017, 16:37

@ainu

Please don't be "depressed" :) Just be careful if you use the C-style assign-and-test pattern (( var = func() ) < value) that the assignment segment must be parenthesized, you are not the first person to omit them and result in a tale of woe!

For the number of byes, you must be careful to respect the documentation for readRawData(). As for most read-type functions, it says:

int QDataStream::readRawData(char *s, int len)
Reads at most len bytes from the stream into s and returns the number of bytes read. If an error occurs, this function returns -1.

Note what I have bolded. You must not assume it will read all of the len bytes you have asked for, only that it will read up to that number, and will return the number it actually it read. So you must use the return result (your dataRead, not your size) to determine how many bytes are actually valid in your buffer.

There is no problem with readRawData() reading arbitrary bytes rather than text. However, when you then proceed to go:

temp[ dataRead + 1 ] = '\0';
setMsg( QString( temp ) );

For "random bytes" your data read into buffer may contain \0 bytes, in which case your idea of plonking a \0 on the end and treating it as a string is flawed, as it may be "cut short" by an embedded \0. Just be aware of that.

Other than that, if you adjust your code to use dataRead correctly as I have indicated, it looks to me as though it will actually work, so you can press on....

ainu · 6 Nov 2017, 09:52

@JNBarchan thank you for your detailed answer. It really helps me a lot and i followed ( of course ) your advice on using return result of function readRawData() instead of size of buf which i used before.
But i still wonder what is the reason that this function behave differently with different files, i.e. when i pass small size file ( like a few kB ) it fills buf with data equal to the size of the buffer but in the case of heavy files it behaves differently and fills buf with random amount of data. The difference between files not just in their size but in the data they keep also. In case of small size file it contains normal text, in case of heavy files it is random data ( this line was used: head -c 2G </dev/urandom >file ). Do you think it can also be because of random data contain "\0"? It would be interesting to learn this but maybe of course it is better to check it in source code

JonB · 7 Nov 2017, 05:07

@ainu
You keep using the word "file". However, is your stream really an actual physical, already-created file on a hard disk, or something else? Because your code:

size_t chunk_size = confFile->getDatagramSize() - DATAGRAM_HEADER;

makes it look like it's something to do with a Datagram?

ainu · 7 Nov 2017, 09:26

@JNBarchan yeah, i use files ( for testing ) from computer to feed to the program. The actual idea of the program is that it will read files from a particular dir ( stream might be added later ) and split them in chunks of a predefined size, after headers will be added. I am trying to implement kind of my own udp protocol, I mean udp datagrams but with my own headers.

JonB · 7 Nov 2017, 12:19

@ainu
Then in answer to your question:

But i still wonder what is the reason that this function behave differently with different files, i.e. when i pass small size file ( like a few kB ) it fills buf with data equal to the size of the buffer but in the case of heavy files it behaves differently and fills buf with random amount of data.

The underlying file access routines will have some kind of buffer size somewhere (though I don't see it accessible from Qt). (It's often a (small) multiple of 4K, e.g. 4K or 8K, possibly.) If the whole of the file fits into the buffer you'll (probably) get the whole of it back on first call, otherwise you'll get it in chunks. I would not expect larger files to fill with a "random amount of data", but rather the buffer size, but in any case you must accept whatever it says it has read.

Another possibility is that it is affected by the way you have opened the file. You do not show that code, and yours might be incorrect. Do you have something like:

file.open(QIODevice::ReadOnly | QIODevice::Text)

If you do and are using QIODevice::Text, that would be wrong for your "random bytes binary file", and might affect the chunks returned by looking for \r\n sequences (though the code might not do that under Linux, I don't know). I have already explained that if the file contains \0 bytes these will read correctly but your QString() debugging code will be "wrong".

It's also possible that the size you pass to readRawData() could affect how chunks are returned. Optimal will be some multiple (like 1x) of the underlying buffer size. Yours is confFile->getDatagramSize() - DATAGRAM_HEADER, which will not be such a multiple, and changing that might give a "smoother" number of chunks/sizes.

If you really need to end up "split them in chunks of a predefined size" to pass on, you really should implement that logic in your own code rather than relying on readRawData() to definitely return a particular, full chunk size that you can use directly, even though it's annoying to have to implement extra code and the readRawData() might end up returning the desired chunk size anyway.

ainu · J JonB 7 Nov 2017, 09:26

@JNBarchan thank you for the detailed answer.