Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Strange behavior when reading text file line by line with QTextStream
QtWS25 Last Chance

Strange behavior when reading text file line by line with QTextStream

Scheduled Pinned Locked Moved General and Desktop
10 Posts 5 Posters 6.3k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • U Offline
    U Offline
    utcenter
    wrote on last edited by
    #1

    I have this basic program:

    @ QFile file("h:/test.txt");
    file.open(QFile::ReadOnly | QFile::Text);
    QTextStream in(&file);

    bool found = false;
    uint pos = 0;
    
    do {
        QString temp = in.readLine();
        int p = temp.indexOf("something");
        if (p < 0) {
            pos += temp.length() + 1;
        } else {
            pos += p;
            found = true;
        }
    } while (!found && !in.atEnd());
    
    in.seek(0);
    QString text = in.read(pos);
    cout << text.toStdString() << endl;@
    

    The input file looks like this:

    bq. this is line one, the first line
    this is line two, it is second
    this is the third line
    and this is line 4
    line 5 goes here
    and finally, there is line number 6

    The idea is of course, to find the first occurrence of a string and load the text file from start to that location. Passing strings that are on the first 5 lines results in the expected output:

    with indexOf("first") output is:

    bq. this is line one, the

    with "cond":

    bq. this is line one, the first line
    this is line two, it is se

    with "here":

    bq. this is line one, the first line
    this is line two, it is second
    this is the third line
    and this is line 4
    line 5 goes

    However, if I pass "num" that is on the last line I get an unexpected result:

    bq. this is line one, the first line
    this is line two, it is second
    this is the third line
    and this is line 4
    line 5 goes here
    and finally, there is

    There are 5 symbols missing on line 6, if it was line 7 there would be 6 symbols missing and so on, all the lines but the last behave normally, the last line cuts lineNumber - 1 symbols.

    Maybe it's because its 5 AM, but I've been starring at this for line 30 minutes and cannot figure out why... so humiliating...

    1 Reply Last reply
    0
    • P Offline
      P Offline
      prady_80
      wrote on last edited by
      #2

      This seems to be a bug...
      I tried putting in debug statements.
      Your code works if you have a newline character at the end of the last line.
      strangely,
      if you don't have a newline at the end, the in.readAll() call returns
      "line number 6"

      if you do, it returns
      "number 6"

      In both the cases the pos value remains the same.
      So as a possible work around you should probably append a new line at the end of file.

      Another strange observation. I did some experiment,
      I took the file without the last line containing a newline at the end and did a in.readAll().size(). It returned me 163 which is correct.
      Then I added a new line at the end of the last line and did the same thing. It returned me 159 which is very strange, whereas it should have returned me 165. Therefore it clearly is a bug. You should log one.

      1 Reply Last reply
      0
      • U Offline
        U Offline
        utcenter
        wrote on last edited by
        #3

        Any other ideas?

        1 Reply Last reply
        0
        • C Offline
          C Offline
          ChrisW67
          wrote on last edited by
          #4

          Running this on a Windows machine by any chance? Your logic allows for the length of each unmatched line plus one byte. On Windows the end-of-line marker is two bytes. So, for each unmatched line you read your pos value is incremented by one byte less than it should be. When you slurp pos bytes at the end you are slurping fewer bytes that you should be.

          I'd use a different approach that is not fussed by line endings. If the files are typically small then something like:
          @
          const QString lookingFor("blah");
          QFile file("h:/test.txt");
          if (file.open(QFile::ReadOnly)) { // line ending conversion not wanted
          QByteArray data = s.readAll();
          const int pos = data.indexOf(lookingFor.toUtf8());
          // must allow for encoding differences ^^^^^^^^
          if (pos >= 0)
          data.truncate(pos);
          }
          QString result = QString::fromUtf8(data);
          @
          I assume the file is UTF8 encoded, you might need to adjust.

          1 Reply Last reply
          0
          • U Offline
            U Offline
            utcenter
            wrote on last edited by
            #5

            If EOL is two bytes, then why I get the expected result for all lines except for the last? I should be losing one character for every line but that is not the case. Compensating with two bytes for each line doesn't produce the expected behavior either.

            1 Reply Last reply
            0
            • M Offline
              M Offline
              MuldeR
              wrote on last edited by
              #6

              There is no guarantee line endings are consistent within one file.

              I agree to ChrisW67, that you shouldn't write code that depends on a specific EOL convention.

              If the file is too big to read all into memory, I would do something like:

              @const qint64 BUFFSIZE = 100*1024; //100 KB
              const QByteArray lookingFor = QString("blah").toUtf8();
              QByteArray data;
              QFile file("h:/test.txt");
              qint64 pos = -1;
              if(file.open(QFile::ReadOnly))
              {
              data.append(file.read(BUFFSIZE));
              const int index = data.indexOf(lookingFor);
              if(index >= 0)
              {
              pos = file.pos() - data.length() + index;
              break;
              }
              data = data.right(lookingFor.length() - 1);
              }@

              Note: We need some overlap to handle the case where the string is on the boundary between two buffers.

              My OpenSource software at: http://muldersoft.com/

              Qt v4.8.6 MSVC 2013, static/shared: http://goo.gl/BXqhrS

              Go visit the coop: http://youtu.be/Jay...

              1 Reply Last reply
              0
              • U Offline
                U Offline
                utcenter
                wrote on last edited by
                #7

                Yes, we all agree the solution is not ideal, but the thread is not about a better solution but about the strange behavior this one produces.

                What puzzles me is why the inconsistency. If the problem is in the EOL character being 2 bytes, then I should be losing a character for each line. But no characters are lost save for the last line. That's what I am failing to understand why and would like to know.

                1 Reply Last reply
                0
                • C Offline
                  C Offline
                  ChrisW67
                  wrote on last edited by
                  #8

                  I was going of a common off-by-one issue with text files on Windows.

                  I cannot reproduce any issue with the first solution: whether the code is run on Windows, Linux, with either line ending, with or without a trailing EOL marker on the last line. I don't see prady_80's "bug" or your inconsistent behaviour.

                  Edit: Damn... I wasn't seeing the obvious. I'll look into it

                  1 Reply Last reply
                  0
                  • P Offline
                    P Offline
                    panosk
                    wrote on last edited by
                    #9

                    Changing the file's EOL from CR/LF to LF/CR fixes the problem. Can't provide more insight though :)

                    1 Reply Last reply
                    0
                    • U Offline
                      U Offline
                      utcenter
                      wrote on last edited by
                      #10

                      According to the size of my input the EOL character is 1 byte. The size of the file is exactly the number of characters plus number of new lines.

                      Something weird is happening on the last line, and so far it has me completely puzzled. Even started a "bounty at SO":http://stackoverflow.com/questions/15850133/qtextstream-behavior-searching-for-a-string-not-as-expected, hopefully someone will shine light on this issue. Not that there aren't workarounds but it got my curiosity.

                      1 Reply Last reply
                      0

                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Users
                      • Groups
                      • Search
                      • Get Qt Extensions
                      • Unsolved