Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Reading line by line of QTextStream into QByteArray with zero padding



  • Hello, I am trying to read every line of a encoded file, so I use

    QFile file(fileName);
    if(file.open(QIODevice::ReadOnly))
    {
        QTextStream in(&file);
        while(!in.atEnd())
        {
            QString line = in.readLine();
            QByteArray test = line.toLocal8Bit(); // also .toUtf8() or .toLatin1()
        }
    }
    

    The reason why I use QByteArray is because the file contains raw data written in bytes. I found the readLine() really useful because that is what I needed. However, when I convert the QString into QByteArray, there is a weird behavior with those bytes greater than 0x80. Since QString stores the characters in 16 bits, readLine does some padding that ruins the coding.

    Example:
    what it's in the file:
    { 0x0c, 0xa6, 0xd5, 0x00, 0x02, 0x08 }
    what is reads
    line = { 0x000c , 0xfffd, 0xfffd, 0x000, 0x002, 0x0008 }
    what it gets converted to
    test = { 0x0c, 0xef, 0xbf, 0xbd, 0xef, 0xbf, 0xbd, 0x02, 0x08 }

    Is there a way to solve this reading problem?



  • @VRonin I hadn't thought of that. Thank you it did the trick ;)
    The final code:

    QFile file(fileName);
    if(file.open(QIODevice::ReadOnly))
    {
        QByteArray line;
        while(!file.atEnd())
        {
            line = file.readLine();
            //...
        }
    }
    


  • @diego-qt said in Reading line by line of QTextStream into QByteArray with zero padding:

    The reason why I use QByteArray is because the file contains raw data written in bytes.

    Then you shouldn't be using QTextStream but you should use QFile::readLine directly. QTextStream is used to read encoded text



  • @VRonin I hadn't thought of that. Thank you it did the trick ;)
    The final code:

    QFile file(fileName);
    if(file.open(QIODevice::ReadOnly))
    {
        QByteArray line;
        while(!file.atEnd())
        {
            line = file.readLine();
            //...
        }
    }
    


  • Just for information, the "weird behavior" is UTF-8 decoding/encoding.

    These bytes { 0x0c, 0xa6, 0xd5, 0x00, 0x02, 0x08 } do not represent a valid UTF-8 encoded string. When readLine() attempts to decode the byte as a UTF-8 string it gets to the invalid portions, specifically the bytes 0xa6, 0xd5, and decodes them as the Unicode Replacement Character 0xfffd. This flags the invalidity without throwing an exception, and makes the subsequent string safe to handle.

    If you then explicitly UTF-8 encode the string containing U+FFFD characters then they get correctly encoded as UTF-8 0xef, 0xbf, 0xbd.


Log in to reply