Take empty lines as separators with the QString::split() function?
-
Hello,
I'm trying to split a text taking as a separator all blank lines. My input is like this :
1 the DT
2 cat NN
3 is VBZ
4 eating VBG
5 the DT
6 mouse NN
7 . P1 my DT
2 dog NN
3 is VBZ
4 hungry JJ
5 . P...
I want to get each sentence of the text. So I put whole text in a QSting and apply the split function to it with the following QRegExp argument
^$
(I've also tried "^\n"). But that pattern does not match at all. When I try to apply the same regex to the same input with the egrep command in my shell, it works well...
My code is as follow :
@ QFile file("/home/clemence/textes_test/jamaica_out.conll");
if (!file.open(QIODevice::ReadOnly))
LERROR << "cannot open file" << endl;
while (!file.atEnd()) {
QByteArray text=file.readAll();
QString textString = QString(text);
QRegExp sentenceSeparator("^\n");
QStringList sent= textString.split(sentenceSeparator, QString::KeepEmptyParts);
LDEBUG << " There is " << sent.size() << "sentences " << LENDL;@The output of it is "There is 1 sentences", that is the whole text not splitted...
Does anyone have of idea of what's wrong ? -
Hi,
^ in regexp means start of the line
-
Well I changed a bit my code as to transform the encoding of my file into utf8 :
@ QFile file("/home/clemence/textes_test/jamaica_out.conll");
if (!file.open(QIODevice::ReadOnly))
LERROR << "cannot open file" << endl;
QTextStream in(&file);
in.setCodec("UTF-8");
while (!file.atEnd()) {
QByteArray text=in.readAll();
QString textString = QString(text);
QRegExp sentenceSeparator("^\n");
QStringList sent= textString.split(sentenceSeparator, QString::KeepEmptyParts);
LDEBUG << " There is " << sent.size() << "sentences " << LENDL;
@but unless I'm doing it wrong, it's not the point...
-
Can't you just check if the QString is empty?
I have done something similar with std::string@ std::ifstream myfile ("file.txt");
if (myfile.is_open())
{
while (getline (myfile,line))
{if (line=="") { raw.push_back(daten); data.clear(); } else { data.push_back(line); } } myfile.close(); raw.push_back(data); data.clear(); }@
In my case every empty line creates a new entry in an vector of an vector.
-
i modified your code sniplet
@ int sent=1;
QFile file("D:\database.txt");
if (!file.open(QIODevice::ReadOnly))
qDebug() << "cannot open file" << endl;
while (!file.atEnd()) {
QByteArray text=file.readLine();
QString textString = QString(text);if (textString.size()<3){sent++;} } qDebug() << " There is " << sent << "sentences ";@
It counts the right amount of lines for me