How do I read in a text file line by line that uses Hex0D as a line terminator?



  • I have a number of scripts that appear to use Hex0D as a line terminator (I checked using an on-line hexdump utility).

    NOTE: I'm running Qt on Windows.

    The following code reads the whole file in one go:
    @
    QFile file(fileName);
    file.open(QIODevice::ReadOnly | QIODevice::Text))
    QTextStream in(&file);
    while (!in.atEnd())
    {
    QString line = in.readline();
    (do stuff)
    }
    @

    ie it ignores the Hex0D line terminator.

    I tried using read() to read character by character. I replaced the last line with
    @QString character = in.read(1);@
    However this just seems to ignore the Hex0D characters altogether.

    Thanks for any help.



  • It's a bit odd that you have files delimiting lines with \r, but there you go. Since QTextStream doesn't seem to provide a configuration for this and it seems to have been a problem for years. In this case I'd be reading the contents of the entire file into a QString, and split it.
    @QStringList lines = contents.split("\r");@



  • Does is not a solution read the file stream in a nested while, breaking every line when the desired character is done? It's a method taken from the serial interface, but can be useful if the file is very big.



  • I would read char by char.
    Maybe TextStream automatically trims \r.

    Try something like this:

    @
    QFile file(fileName);
    file.open(QIODevice::ReadOnly)
    while (!file.atEnd())
    {
    QByteArray line;
    while (!file.atEnd()) {
    QByteArray readChar = file.read(1);
    if (readChar == "\r") {
    break;
    } else {
    line.append(readChar);
    }
    }
    (do stuff)
    }
    @

    WARNING: I didn't test this code.. I'm in a hurry. Hope it helps, cya!



  • Jonathan,

    it's a solution similar to one I already adopted in past (that obviously I don't find now to post here...)

    Only an advice: take care of that "forever" loop. If you file is only one line (or only one character) without any \r character your loop never exit. Simply add in the forever loop a check: if file.read(1) doesn't returns a character, this means that you have reached the EOF so exit anyway from the forever loop.



  • Using
    @ if (readChar == "\r") @
    works provided that I also drop QIODevice::Text, ie use
    @file.open(QIODevice::ReadOnly)@



  • [quote author="Alicemirror" date="1299684492"]Jonathan,

    it's a solution similar to one I already adopted in past (that obviously I don't find now to post here...)

    Only an advice: take care of that "forever" loop. If you file is only one line (or only one character) without any \r character your loop never exit. Simply add in the forever loop a check: if file.read(1) doesn't returns a character, this means that you have reached the EOF so exit anyway from the forever loop.[/quote]

    You're right, instead of
    @forever@
    should be
    @while (!file.atEnd())@

    I editted the first post.



  • [quote author="Jonathan" date="1299684717"]Using
    @ if (readChar == "\r") @
    works provided that I also drop QIODevice::Text, ie use
    @file.open(QIODevice::ReadOnly)@[/quote]

    Indeed, and reading the documentation you'll see that QIODevice::Text was the root of your problem :)



  • Actually, the root of the problem is that Qt doesn't accept \r as a valid EOL.



  • Franzk, with the solution suggested by Ivan it does no matter.
    What you do is to read from the file one character at a time until you reach the end. For every character, you check if the value corresponds to what you consider the EOL, then the program close the text line and go ahead.
    Indeed, using this method your EOL character can be everything you want: is the program that create lines based on the character that you decide. When the line is complete, I suppose that the source creating a line string set automatically the EOL with the right character. A similar approach is those used to manage linux-encoded text files with windows programs and vice-versa.



  • [quote author="Franzk" date="1299693469"]Actually, the root of the problem is that Qt doesn't accept \r as a valid EOL.[/quote]
    Why should it? No platform supported by Qt uses that control character.



  • Why complicating a simple thing? There are tons of cases where a file is non standard. Reading character by character the problem does not exist.

    Or not ?



  • It would be nice to be able to control it though. If you have to produce files that have to be processed by other tools, control is needed.



  • Andre, what does you means with "control"? To set a control like those explained above or what ?



  • [quote author="Alicemirror" date="1299694794"]Andre, what does you means with "control"? To set a control like those explained above or what ?[/quote]

    I mean that I, as a programmer, can choose what QTextStream and QIODevice considder the line separator. Sure, you can output your data character by character, but I did not choose Qt to do everything by hand...



  • [quote author="Franzk" date="1299693469"]Actually, the root of the problem is that Qt doesn't accept \r as a valid EOL.[/quote]

    Well, yeah.. That's the actual root of the problem :/

    I've been reading too much Qt Labs Blog posts.. Sadly, it seems I'm becoming one of those qt-is-perfect-no-need-to-change-anything guys >.<

    Jonathan, you should file a bug report http://bugreports.qt.nokia.com/



  • [quote author="Andre" date="1299694989"][quote author="Alicemirror" date="1299694794"]Andre, what does you means with "control"? To set a control like those explained above or what ?[/quote]

    I mean that I, as a programmer, can choose what QTextStream and QIODevice considder the line separator. Sure, you can output your data character by character, but I did not choose Qt to do everything by hand...[/quote]

    I agree with you. The best solution is indeed that.
    QTextStream and QIODevice already have the subroutines to read a file up to some hardcoded characters. It would be wise to add some "control" on what these classes consider a EOL.

    Then, the same routine could be used to read any character-separated file.



  • [quote author="peppe" date="1299694316"][quote author="Franzk" date="1299693469"]Actually, the root of the problem is that Qt doesn't accept \r as a valid EOL.[/quote]
    Why should it? No platform supported by Qt uses that control character. [/quote]

    Mac is no longer supported?

    Even if it is true, \r is still a valid line ending and should therefore be supported, even if only by configuration.



  • [quote author="Franzk" date="1299695807"][quote author="peppe" date="1299694316"][quote author="Franzk" date="1299693469"]Actually, the root of the problem is that Qt doesn't accept \r as a valid EOL.[/quote]
    Why should it? No platform supported by Qt uses that control character. [/quote]

    Mac is no longer supported?

    Even if it is true, \r is still a valid line ending and should therefore be supported, even if only by configuration.[/quote]

    Just to add to your arguement, a quote from wikipedia:

    http://en.wikipedia.org/wiki/Newline :
    "Systems based on ASCII or a compatible character set use either LF (Line feed, '\n', 0x0A, 10 in decimal) or CR (Carriage return, '\r', 0x0D, 13 in decimal) individually, or CR followed by LF (CR+LF, '\r\n', 0x0D 0x0A). "



  • AH :)

    Andre, you are right! As a matter of fact I focused the attention of all my answers thinking that there was a non standard line terminated file, while - silly, real! - \r is the old, famous carriage return...



  • As suggested, I've filed bug report QTBUG-18038.



  • [quote author="Franzk" date="1299695807"][quote author="peppe" date="1299694316"][quote author="Franzk" date="1299693469"]Actually, the root of the problem is that Qt doesn't accept \r as a valid EOL.[/quote]
    Why should it? No platform supported by Qt uses that control character. [/quote]

    Mac is no longer supported?

    Even if it is true, \r is still a valid line ending and should therefore be supported, even if only by configuration.[/quote]

    Therefore I'm allowed to argue that ASCII 0x07 (BEL) is a valid line ending in my wonderful system, therefore QTextStream should support it? :)

    Come on, stick to reality: if you need custom line endings handle the line splitting yourself. It's easy and always works.

    (BTW: where do those files come from? Mac OS 9?)

    Eventually, you can suggest an API extension to allow for custom line endings in QTextStream, and/or provide the implementation yourself (quite easy) and submit a merge request.



  • My code would have been much tidier if I could have specified the end of line character.

    But to be general, it would have to be a string. Didn't some systems use "x0Ax0D"?



  • [quote author="Franzk" date="1299695807"][quote author="peppe" date="1299694316"][quote author="Franzk" date="1299693469"]Actually, the root of the problem is that Qt doesn't accept \r as a valid EOL.[/quote]
    Why should it? No platform supported by Qt uses that control character. [/quote]

    Mac is no longer supported?

    Even if it is true, \r is still a valid line ending and should therefore be supported, even if only by configuration.[/quote]

    Mac OS 9 was never supported during Qt 3 nor Qt 4. Mac OS X uses Unix newlines ('\n').



  • [quote author="Jonathan" date="1299701652"]My code would have been much tidier if I could have specified the end of line character.

    But to be general, it would have to be a string. Didn't some systems use "x0Ax0D"?
    [/quote]

    Yes, f.i. Windows.



  • [quote author="peppe" date="1299701793"]
    [quote author="Jonathan" date="1299701652"]My code would have been much tidier if I could have specified the end of line character.

    But to be general, it would have to be a string. Didn't some systems use "x0Ax0D"?
    [/quote]

    Yes, f.i. Windows.[/quote]

    That has always been \r\n, not \n\r as stated above.



  • [quote author="peppe" date="1299701466"]Therefore I'm allowed to argue that ASCII 0x07 (BEL) is a valid line ending in my wonderful system, therefore QTextStream should support it? :)[/quote]That might be taking it a bit far, but if you insist, I'm sure the implementer could take into account that you wish to view BEL as a EOL as well ;).

    Edit: I just added a note to the issue referencing "the unicode standard on newlines":http://www.unicode.org/standard/reports/tr13/tr13-5.html.



  • Hi Franzk,

    thank you for your note, but the Newline character I even known is NL, not NEL, or this is another definition that I don't know?

    Qt full respect the Unicode specifications for this character? Because regardless from this specific case, the differences between the line termination in files became very important on Qt development environment that can work on different desktop platforms (Linux+Mac and Windows) where the sources the same are saved with different line-termination characters: if you try to open a source code created with QT-Linux under Windows (with notepad) it result unreadable, while Qt-Windows does the right interpretation.



  • I usually interpret NL as newline, which is equivalent to \n which is actually LF (LineFeed). NEL is NExtLine. Why there is such a difference I don't know.

    Notepad has a habit of only accepting \r\n as line termination. Use proper editors ;).



  • [quote author="Franzk" date="1299736994"][quote author="peppe" date="1299701793"]
    [quote author="Jonathan" date="1299701652"]My code would have been much tidier if I could have specified the end of line character.

    But to be general, it would have to be a string. Didn't some systems use "x0Ax0D"?
    [/quote]

    Yes, f.i. Windows.[/quote]

    That has always been \r\n, not \n\r as stated above.[/quote]

    Oops, you're right :-) I switched the bytes in my mind.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.