Solved Reading line by line of QTextStream into QByteArray with zero padding
-
Hello, I am trying to read every line of a encoded file, so I use
QFile file(fileName); if(file.open(QIODevice::ReadOnly)) { QTextStream in(&file); while(!in.atEnd()) { QString line = in.readLine(); QByteArray test = line.toLocal8Bit(); // also .toUtf8() or .toLatin1() } }
The reason why I use
QByteArray
is because the file contains raw data written in bytes. I found thereadLine()
really useful because that is what I needed. However, when I convert theQString
intoQByteArray
, there is a weird behavior with those bytes greater than 0x80. SinceQString
stores the characters in 16 bits,readLine
does some padding that ruins the coding.Example:
what it's in the file:
{ 0x0c, 0xa6, 0xd5, 0x00, 0x02, 0x08 }
what is reads
line = { 0x000c , 0xfffd, 0xfffd, 0x000, 0x002, 0x0008 }
what it gets converted to
test = { 0x0c, 0xef, 0xbf, 0xbd, 0xef, 0xbf, 0xbd, 0x02, 0x08 }
Is there a way to solve this reading problem?
-
@VRonin I hadn't thought of that. Thank you it did the trick ;)
The final code:QFile file(fileName); if(file.open(QIODevice::ReadOnly)) { QByteArray line; while(!file.atEnd()) { line = file.readLine(); //... } }
-
@diego-qt said in Reading line by line of QTextStream into QByteArray with zero padding:
The reason why I use QByteArray is because the file contains raw data written in bytes.
Then you shouldn't be using
QTextStream
but you should useQFile::readLine
directly.QTextStream
is used to read encoded text -
@VRonin I hadn't thought of that. Thank you it did the trick ;)
The final code:QFile file(fileName); if(file.open(QIODevice::ReadOnly)) { QByteArray line; while(!file.atEnd()) { line = file.readLine(); //... } }
-
Just for information, the "weird behavior" is UTF-8 decoding/encoding.
These bytes
{ 0x0c, 0xa6, 0xd5, 0x00, 0x02, 0x08 }
do not represent a valid UTF-8 encoded string. When readLine() attempts to decode the byte as a UTF-8 string it gets to the invalid portions, specifically the bytes0xa6, 0xd5
, and decodes them as the Unicode Replacement Character0xfffd
. This flags the invalidity without throwing an exception, and makes the subsequent string safe to handle.If you then explicitly UTF-8 encode the string containing U+FFFD characters then they get correctly encoded as UTF-8
0xef, 0xbf, 0xbd
.