Take empty lines as separators with the QString::split() function?



  • Hello,

    I'm trying to split a text taking as a separator all blank lines. My input is like this :

    1 the DT
    2 cat NN
    3 is VBZ
    4 eating VBG
    5 the DT
    6 mouse NN
    7 . P

    1 my DT
    2 dog NN
    3 is VBZ
    4 hungry JJ
    5 . P

    ...

    I want to get each sentence of the text. So I put whole text in a QSting and apply the split function to it with the following QRegExp argument

    ^$

    (I've also tried "^\n"). But that pattern does not match at all. When I try to apply the same regex to the same input with the egrep command in my shell, it works well...

    My code is as follow :

    @ QFile file("/home/clemence/textes_test/jamaica_out.conll");
    if (!file.open(QIODevice::ReadOnly))
    LERROR << "cannot open file" << endl;
    while (!file.atEnd()) {
    QByteArray text=file.readAll();
    QString textString = QString(text);
    QRegExp sentenceSeparator("^\n");
    QStringList sent= textString.split(sentenceSeparator, QString::KeepEmptyParts);
    LDEBUG << " There is " << sent.size() << "sentences " << LENDL;@

    The output of it is "There is 1 sentences", that is the whole text not splitted...
    Does anyone have of idea of what's wrong ?



  • There will be some missing escape characters in "^\n" for the "".



  • Well, I've just tried... adding a "" before "\n" does not change...



  • Don't give up easily :-) and try up to four "". And compare the regexp docu about it. You can click on QRegExp in your code above,



  • Thanks for your're fostering me...
    What is not recognized is actually the "^" symbol... Cause I tried to match it alone and no result was found...


  • Lifetime Qt Champion

    Hi,

    ^ in regexp means start of the line



  • Well I changed a bit my code as to transform the encoding of my file into utf8 :

    @ QFile file("/home/clemence/textes_test/jamaica_out.conll");
    if (!file.open(QIODevice::ReadOnly))
    LERROR << "cannot open file" << endl;
    QTextStream in(&file);
    in.setCodec("UTF-8");
    while (!file.atEnd()) {
    QByteArray text=in.readAll();
    QString textString = QString(text);
    QRegExp sentenceSeparator("^\n");
    QStringList sent= textString.split(sentenceSeparator, QString::KeepEmptyParts);
    LDEBUG << " There is " << sent.size() << "sentences " << LENDL;
    @

    but unless I'm doing it wrong, it's not the point...



  • Can't you just check if the QString is empty?
    I have done something similar with std::string

    @ std::ifstream myfile ("file.txt");
    if (myfile.is_open())
    {
    while (getline (myfile,line))
    {

            if (line=="")
            {
                raw.push_back(daten);
                data.clear();
            }
            else
            {
                data.push_back(line);
            }
    
        }
        myfile.close();
        raw.push_back(data);
        data.clear();
    }@
    

    In my case every empty line creates a new entry in an vector of an vector.



  • I have, my Qstring contains my text as expected...



  • i modified your code sniplet

    @ int sent=1;
    QFile file("D:\database.txt");
    if (!file.open(QIODevice::ReadOnly))
    qDebug() << "cannot open file" << endl;
    while (!file.atEnd()) {
    QByteArray text=file.readLine();
    QString textString = QString(text);

        if (textString.size()<3){sent++;}
    
    }
    qDebug() << " There is " << sent << "sentences ";@
    

    It counts the right amount of lines for me



  • Thanks but I don't want to count lines but sentences. If you look at my input file at the beginning of this topic page, you can see that there are several lines for 1 sentence. That's why I used the readAll function and the "^$" QRegExp.



  • Hi by lines i meant sentences, you can just copy&paste the sniplet and see if the number matches your data. For the given example it would compute: "There is 2 sentences".



  • It actually computes the number of line, that is not the number of sentence



  • Okay I've just understood what you meant nnead. I tested your code on data with longer lines so it couldn't work... Thank you :)


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.