Reading line by line of QTextStream into QByteArray with zero padding
-
Hello, I am trying to read every line of a encoded file, so I use
QFile file(fileName); if(file.open(QIODevice::ReadOnly)) { QTextStream in(&file); while(!in.atEnd()) { QString line = in.readLine(); QByteArray test = line.toLocal8Bit(); // also .toUtf8() or .toLatin1() } }The reason why I use
QByteArrayis because the file contains raw data written in bytes. I found thereadLine()really useful because that is what I needed. However, when I convert theQStringintoQByteArray, there is a weird behavior with those bytes greater than 0x80. SinceQStringstores the characters in 16 bits,readLinedoes some padding that ruins the coding.Example:
what it's in the file:
{ 0x0c, 0xa6, 0xd5, 0x00, 0x02, 0x08 }
what is reads
line = { 0x000c , 0xfffd, 0xfffd, 0x000, 0x002, 0x0008 }
what it gets converted to
test = { 0x0c, 0xef, 0xbf, 0xbd, 0xef, 0xbf, 0xbd, 0x02, 0x08 }Is there a way to solve this reading problem?
-
@diego-qt said in Reading line by line of QTextStream into QByteArray with zero padding:
The reason why I use QByteArray is because the file contains raw data written in bytes.
Then you shouldn't be using
QTextStreambut you should useQFile::readLinedirectly.QTextStreamis used to read encoded text -
Hello, I am trying to read every line of a encoded file, so I use
QFile file(fileName); if(file.open(QIODevice::ReadOnly)) { QTextStream in(&file); while(!in.atEnd()) { QString line = in.readLine(); QByteArray test = line.toLocal8Bit(); // also .toUtf8() or .toLatin1() } }The reason why I use
QByteArrayis because the file contains raw data written in bytes. I found thereadLine()really useful because that is what I needed. However, when I convert theQStringintoQByteArray, there is a weird behavior with those bytes greater than 0x80. SinceQStringstores the characters in 16 bits,readLinedoes some padding that ruins the coding.Example:
what it's in the file:
{ 0x0c, 0xa6, 0xd5, 0x00, 0x02, 0x08 }
what is reads
line = { 0x000c , 0xfffd, 0xfffd, 0x000, 0x002, 0x0008 }
what it gets converted to
test = { 0x0c, 0xef, 0xbf, 0xbd, 0xef, 0xbf, 0xbd, 0x02, 0x08 }Is there a way to solve this reading problem?
@diego-qt said in Reading line by line of QTextStream into QByteArray with zero padding:
The reason why I use QByteArray is because the file contains raw data written in bytes.
Then you shouldn't be using
QTextStreambut you should useQFile::readLinedirectly.QTextStreamis used to read encoded text -
@diego-qt said in Reading line by line of QTextStream into QByteArray with zero padding:
The reason why I use QByteArray is because the file contains raw data written in bytes.
Then you shouldn't be using
QTextStreambut you should useQFile::readLinedirectly.QTextStreamis used to read encoded text -
Just for information, the "weird behavior" is UTF-8 decoding/encoding.
These bytes
{ 0x0c, 0xa6, 0xd5, 0x00, 0x02, 0x08 }do not represent a valid UTF-8 encoded string. When readLine() attempts to decode the byte as a UTF-8 string it gets to the invalid portions, specifically the bytes0xa6, 0xd5, and decodes them as the Unicode Replacement Character0xfffd. This flags the invalidity without throwing an exception, and makes the subsequent string safe to handle.If you then explicitly UTF-8 encode the string containing U+FFFD characters then they get correctly encoded as UTF-8
0xef, 0xbf, 0xbd.