[Solved] How to know if QTextStream could not encode data it reads?

  • In a simple editor app I am using QFile and QTextStream to read a text file with the codec set to "UTF-8". If the user incorrectly gives me the path to a file encoded Latin-1, some of the characters will be automatically replaced with \uFFFD, the "replacement character". This is bad, because if this is not noticed, and the user saves the file, now she has a UTF-8 file with some characters lost.

    I would like to somehow detect if the QTextStream -- or the QFile or the QTextCodec -- was not able to properly decode the input file and was forced to use replacement characters. Then I could either warn the user, or close it and re-open with a different codec.

    However, I do not see a signal that would be raised. I do not see a property to set to say "stop on encoding error". I hoped that the QTextStream/QFile status would show the ReadCorruptData value in this case, but the status is 0.

    The only way I see at the moment is to use readAll() and scan the possibly very large string returned for \uFFFD before putting that string into the editor.

    Edit: I realized that I don't need to access the file content as a separate string and search it. I am loading it into an editor with QPlainTextEdit::document::setPlainText() and the editor is best positioned to do the search. I simply call the document's find() method with a string of \ufffd; if the returned cursor isNull() then it was not found and the file was properly decoded. Otherwise I can warn the user that there is at least one badly-decoded character at the position() of that cursor.

  • @dcortesi said:



    Have you tried this solution?

    QTextCodec::ConverterState state;
    QTextCodec *codec = QTextCodec::codecForName("UTF-8");
    const QByteArray data(readSource());
    const QString text = codec->toUnicode(data.constData(), data.size(), &state);
    if (state.invalidChars > 0)
        // Not a UTF-8 text - using system default locale
        QTextCodec * codec = QTextCodec::codecForLocale();
        if (!codec)

    Best, Steven

  • Thank you. It is useful that you point to QTextCodec::toUnicode() and its requirement of a separate ConverterState object to hold the result. However there is a conceptual gap between that and my problem.
    At least, I am not clear on how to get from a file path string to bytes in a QByteArray. Not by use of a QTextStream object certainly, because its use implies decoded characters, not raw bytes. To load the "data" array in your code I would use -- QIODevice?
    Since QTextStream has to perform a toUnicode() call as part of implementing read() or readAll(), it would be most convenient if it would keep a ConverterState and made that accessible as a property. But apparently not.
    Thanks again.

Log in to reply

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.