Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Take empty lines as separators with the QString::split() function?

Take empty lines as separators with the QString::split() function?

Scheduled Pinned Locked Moved General and Desktop
14 Posts 4 Posters 4.4k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Offline
    C Offline
    clemmy9
    wrote on last edited by
    #1

    Hello,

    I'm trying to split a text taking as a separator all blank lines. My input is like this :

    1 the DT
    2 cat NN
    3 is VBZ
    4 eating VBG
    5 the DT
    6 mouse NN
    7 . P

    1 my DT
    2 dog NN
    3 is VBZ
    4 hungry JJ
    5 . P

    ...

    I want to get each sentence of the text. So I put whole text in a QSting and apply the split function to it with the following QRegExp argument

    ^$

    (I've also tried "^\n"). But that pattern does not match at all. When I try to apply the same regex to the same input with the egrep command in my shell, it works well...

    My code is as follow :

    @ QFile file("/home/clemence/textes_test/jamaica_out.conll");
    if (!file.open(QIODevice::ReadOnly))
    LERROR << "cannot open file" << endl;
    while (!file.atEnd()) {
    QByteArray text=file.readAll();
    QString textString = QString(text);
    QRegExp sentenceSeparator("^\n");
    QStringList sent= textString.split(sentenceSeparator, QString::KeepEmptyParts);
    LDEBUG << " There is " << sent.size() << "sentences " << LENDL;@

    The output of it is "There is 1 sentences", that is the whole text not splitted...
    Does anyone have of idea of what's wrong ?

    1 Reply Last reply
    0
    • M Offline
      M Offline
      msue
      wrote on last edited by
      #2

      There will be some missing escape characters in "^\n" for the "".

      1 Reply Last reply
      0
      • C Offline
        C Offline
        clemmy9
        wrote on last edited by
        #3

        Well, I've just tried... adding a "" before "\n" does not change...

        1 Reply Last reply
        0
        • M Offline
          M Offline
          msue
          wrote on last edited by
          #4

          Don't give up easily :-) and try up to four "". And compare the regexp docu about it. You can click on QRegExp in your code above,

          1 Reply Last reply
          0
          • C Offline
            C Offline
            clemmy9
            wrote on last edited by
            #5

            Thanks for your're fostering me...
            What is not recognized is actually the "^" symbol... Cause I tried to match it alone and no result was found...

            1 Reply Last reply
            0
            • SGaistS Offline
              SGaistS Offline
              SGaist
              Lifetime Qt Champion
              wrote on last edited by
              #6

              Hi,

              ^ in regexp means start of the line

              Interested in AI ? www.idiap.ch
              Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

              1 Reply Last reply
              0
              • C Offline
                C Offline
                clemmy9
                wrote on last edited by
                #7

                Well I changed a bit my code as to transform the encoding of my file into utf8 :

                @ QFile file("/home/clemence/textes_test/jamaica_out.conll");
                if (!file.open(QIODevice::ReadOnly))
                LERROR << "cannot open file" << endl;
                QTextStream in(&file);
                in.setCodec("UTF-8");
                while (!file.atEnd()) {
                QByteArray text=in.readAll();
                QString textString = QString(text);
                QRegExp sentenceSeparator("^\n");
                QStringList sent= textString.split(sentenceSeparator, QString::KeepEmptyParts);
                LDEBUG << " There is " << sent.size() << "sentences " << LENDL;
                @

                but unless I'm doing it wrong, it's not the point...

                1 Reply Last reply
                0
                • N Offline
                  N Offline
                  nnead
                  wrote on last edited by
                  #8

                  Can't you just check if the QString is empty?
                  I have done something similar with std::string

                  @ std::ifstream myfile ("file.txt");
                  if (myfile.is_open())
                  {
                  while (getline (myfile,line))
                  {

                          if (line=="")
                          {
                              raw.push_back(daten);
                              data.clear();
                          }
                          else
                          {
                              data.push_back(line);
                          }
                  
                      }
                      myfile.close();
                      raw.push_back(data);
                      data.clear();
                  }@
                  

                  In my case every empty line creates a new entry in an vector of an vector.

                  1 Reply Last reply
                  0
                  • C Offline
                    C Offline
                    clemmy9
                    wrote on last edited by
                    #9

                    I have, my Qstring contains my text as expected...

                    1 Reply Last reply
                    0
                    • N Offline
                      N Offline
                      nnead
                      wrote on last edited by
                      #10

                      i modified your code sniplet

                      @ int sent=1;
                      QFile file("D:\database.txt");
                      if (!file.open(QIODevice::ReadOnly))
                      qDebug() << "cannot open file" << endl;
                      while (!file.atEnd()) {
                      QByteArray text=file.readLine();
                      QString textString = QString(text);

                          if (textString.size()<3){sent++;}
                      
                      }
                      qDebug() << " There is " << sent << "sentences ";@
                      

                      It counts the right amount of lines for me

                      1 Reply Last reply
                      0
                      • C Offline
                        C Offline
                        clemmy9
                        wrote on last edited by
                        #11

                        Thanks but I don't want to count lines but sentences. If you look at my input file at the beginning of this topic page, you can see that there are several lines for 1 sentence. That's why I used the readAll function and the "^$" QRegExp.

                        1 Reply Last reply
                        0
                        • N Offline
                          N Offline
                          nnead
                          wrote on last edited by
                          #12

                          Hi by lines i meant sentences, you can just copy&paste the sniplet and see if the number matches your data. For the given example it would compute: "There is 2 sentences".

                          1 Reply Last reply
                          0
                          • C Offline
                            C Offline
                            clemmy9
                            wrote on last edited by
                            #13

                            It actually computes the number of line, that is not the number of sentence

                            1 Reply Last reply
                            0
                            • C Offline
                              C Offline
                              clemmy9
                              wrote on last edited by
                              #14

                              Okay I've just understood what you meant nnead. I tested your code on data with longer lines so it couldn't work... Thank you :)

                              1 Reply Last reply
                              0

                              • Login

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • Users
                              • Groups
                              • Search
                              • Get Qt Extensions
                              • Unsolved