Regular expression help needed

aurora

I have collection of files, the contents of all those files have the following format

@-- File name

-- listOne (L1)
-- listTwo (L2)
-- listThree (L3)
-- HeaderLine (HE)
-- listFour (L6)
-- listFive (L2)
-- listSix (L9)
-- listSeven (L0)
-- someline (SL)
-- listeight (LL)

--
REMAINING CONTENTS OF THE LINE

some more contents

@
Here i want to store only L1,L2,L3 etc in a list, except HE,SL and remaining lines of files
How can i do that?
Please help me, i went through QREgExp class defination also, and i wrote code but that seems to be very big and inserts some blank strings into stored list

@
while(!f.atEnd() && (!line.contains("------------------------------------------")))
{

if(!line.contains("-- "))
{
flag=1;
QRegExp rx("[\(]([a-z]|[0-9]|[_]|[A-Z])+[\)]");
rx.indexIn(line);
QRegExp rx1("([a-z]|[0-9]|[_]|[A-Z])+");
rx1.indexIn(rx.cap(0));
captured.append(rx1.cap(0));
line=f.readLine();
}
else if(flag==1)
{
flag++;
captured.pop_back();
QRegExp rx("[\(]([a-z]|[0-9]|[_]|[A-Z])+[\)]");
rx.indexIn(line);
QRegExp rx1("([a-z]|[0-9]|[_]|[A-Z])+");
rx1.indexIn(rx.cap(0));
captured.append(rx1.cap(0));
line=f.readLine();
}
 
else if(flag>0)
{ flag++;
QRegExp rx("[\(]([a-z]|[0-9]|[_]|[A-Z])+[\)]");
rx.indexIn(line);
QRegExp rx1("([a-z]|[0-9]|[_]|[A-Z])+");
rx1.indexIn(rx.cap(0));
 
 
captured.append(rx1.cap(0));
line=f.readLine();
}
 
}

@

Please help me solve this problem

sierdzio

All regexps seem to be the same, you can move this part of the code into a function, it would save you LOC and make maintenance easier.

Also, if I get it right, all you need to do is store all whole lines containing "(XY)", except those with "HL" and "SL"? Then, why not do it like that:
@
if (line.contains(QRegExp("[(]\w\w[)]")) { // Get all lines with "(XY)"
if (line.contains("HL") || line.contains("SL")) { // Throw away those with "HL" or "SL"
continue;
}
// do your code here
}
@

sierdzio

Regexp might be wrong, but I'm in a hurry now and don't have time to think it through. But you'll probably get the idea.

aurora

Thank u.....but u misunderstood....may be i explained it wrongly...It is just a format, words are not same.....
I dont want to store those lines, which has sub lines.....
eg:
@
-- someline(kk)
-- main line(mm)
-- this is subline(ab)
-- this is another subline(hh)
in such case i want only sublines....@

[quote author="sierdzio" date="1326455688"]All regexps seem to be the same, you can move this part of the code into a function, it would save you LOC and make maintenance easier.

Also, if I get it right, all you need to do is store all whole lines containing "(XY)", except those with "HL" and "SL"? Then, why not do it like that:
@
if (line.contains(QRegExp("[(]\w\w[)]")) { // Get all lines with "(XY)"
if (line.contains("HL") || line.contains("SL")) { // Throw away those with "HL" or "SL"
continue;
}
// do your code here
}
@[/quote]

goetz

Best way to describe your goal would be to show the input list and the result that you expect.

aurora

[quote author="Volker" date="1326491850"]Best way to describe your goal would be to show the input list and the result that you expect.[/quote]

ok...my input is file shown above,
and regular expression must capture
only L1,L2,L3,L6,L2,L9,L0,LL

it should not capture the line which has subline, thats all...

goetz

The following snippet should show you the basic principle:

@
QStringList l;
l << "listOne (L1)";
l << "listTwo (L2)";
l << "listThree (L3)";
l << "HeaderLine (HE)";
l << "listFour (L6)";
l << "listFive (L2)";
l << "listSix (L9)";
l << "listSeven (L0)";
l << "someline (SL)";
l << "listeight (LL)";

QRegExp re("^.+\s+$(L[0-9L])$$");
foreach(const QString s, l) {
qDebug() << "check string" << s;
if(re.exactMatch(s)) {
QString code = re.cap(1);
qDebug() << " found mach" << code;
} else {
qDebug() << " no match";
}
}
@

Short explanation of the regex:

^.+
matches everything at the start of the string
\s+
followed by at least one (or more) whitespace character(s) (space, tab, newlines)
\(
followed by a literal opening parenthesis. Actually it is (, but the backslash needs to be encoded for C string construction
(
start a caption group
L[0-9L]
followd by a literal L and exactly one of 0, 1, 2... 9 or L
)
end the caption gropu
\)
followed by a literal closing parenthesis
$
at the end of the string

The caption group contains what has been matched in between, which will be one of L0, L1, L2... L9, LL.

aurora

Sorry Volker, not like that....

All texts inside round bracket, which is present at the end of all line.
And regular expression should not capture line which has sub line..
example input:
@
-- afgh hkjhkh(gk_6)
-- its main line (aa) <<--except this line capture remaining, as this has subline
-- its sub line(bb) <<----subline
-- its another subline(cc) <<-----subline
-- something(dd09)
-- this is also(tr_8787)@

And output should be: gk_6,aa,bb,cc,dd09,tr_8787

aureshinite

Learn about regular expressions. Period.

goetz

It is up to you to detect what's a "subline" and skip the regex on that alltogether.

I recommend to study the [[Doc:QString]] documentation. It has various helpful methods. Read through the method list and descriptions.

Regular expression help needed

@-- File name

-- listOne (L1) -- listTwo (L2) -- listThree (L3) -- HeaderLine (HE) -- listFour (L6) -- listFive (L2) -- listSix (L9) -- listSeven (L0) -- someline (SL) -- listeight (LL)

-- REMAINING CONTENTS OF THE LINE

some more contents

-- listOne (L1)
-- listTwo (L2)
-- listThree (L3)
-- HeaderLine (HE)
-- listFour (L6)
-- listFive (L2)
-- listSix (L9)
-- listSeven (L0)
-- someline (SL)
-- listeight (LL)

--
REMAINING CONTENTS OF THE LINE