Regular expression help needed



  • I have collection of files, the contents of all those files have the following format

    @-- File name

    -- listOne (L1)
    -- listTwo (L2)
    -- listThree (L3)
    -- HeaderLine (HE)
    -- listFour (L6)
    -- listFive (L2)
    -- listSix (L9)
    -- listSeven (L0)
    -- someline (SL)
    -- listeight (LL)

    --
    REMAINING CONTENTS OF THE LINE

    some more contents

    @
    Here i want to store only L1,L2,L3 etc in a list, except HE,SL and remaining lines of files
    How can i do that?
    Please help me, i went through QREgExp class defination also, and i wrote code but that seems to be very big and inserts some blank strings into stored list

    @
    while(!f.atEnd() && (!line.contains("------------------------------------------")))
    {

    if(!line.contains("-- "))
    {
    flag=1;
    QRegExp rx("[\(]([a-z]|[0-9]|[_]|[A-Z])+[\)]");
    rx.indexIn(line);
    QRegExp rx1("([a-z]|[0-9]|[_]|[A-Z])+");
    rx1.indexIn(rx.cap(0));
    captured.append(rx1.cap(0));
    line=f.readLine();
    }
    else if(flag==1)
    {
    flag++;
    captured.pop_back();
    QRegExp rx("[\(]([a-z]|[0-9]|[_]|[A-Z])+[\)]");
    rx.indexIn(line);
    QRegExp rx1("([a-z]|[0-9]|[_]|[A-Z])+");
    rx1.indexIn(rx.cap(0));
    captured.append(rx1.cap(0));
    line=f.readLine();
    }
     
    else if(flag>0)
    { flag++;
    QRegExp rx("[\(]([a-z]|[0-9]|[_]|[A-Z])+[\)]");
    rx.indexIn(line);
    QRegExp rx1("([a-z]|[0-9]|[_]|[A-Z])+");
    rx1.indexIn(rx.cap(0));
     
     
    captured.append(rx1.cap(0));
    line=f.readLine();
    }
     
    }
    

    @

    Please help me solve this problem


  • Moderators

    All regexps seem to be the same, you can move this part of the code into a function, it would save you LOC and make maintenance easier.

    Also, if I get it right, all you need to do is store all whole lines containing "(XY)", except those with "HL" and "SL"? Then, why not do it like that:
    @
    if (line.contains(QRegExp("[(]\w\w[)]")) { // Get all lines with "(XY)"
    if (line.contains("HL") || line.contains("SL")) { // Throw away those with "HL" or "SL"
    continue;
    }
    // do your code here
    }
    @


  • Moderators

    Regexp might be wrong, but I'm in a hurry now and don't have time to think it through. But you'll probably get the idea.



  • Thank u.....but u misunderstood....may be i explained it wrongly...It is just a format, words are not same.....
    I dont want to store those lines, which has sub lines.....
    eg:
    @
    -- someline(kk)
    -- main line(mm)
    -- this is subline(ab)
    -- this is another subline(hh)
    in such case i want only sublines....@

    [quote author="sierdzio" date="1326455688"]All regexps seem to be the same, you can move this part of the code into a function, it would save you LOC and make maintenance easier.

    Also, if I get it right, all you need to do is store all whole lines containing "(XY)", except those with "HL" and "SL"? Then, why not do it like that:
    @
    if (line.contains(QRegExp("[(]\w\w[)]")) { // Get all lines with "(XY)"
    if (line.contains("HL") || line.contains("SL")) { // Throw away those with "HL" or "SL"
    continue;
    }
    // do your code here
    }
    @[/quote]



  • Best way to describe your goal would be to show the input list and the result that you expect.



  • [quote author="Volker" date="1326491850"]Best way to describe your goal would be to show the input list and the result that you expect.[/quote]

    ok...my input is file shown above,
    and regular expression must capture
    only L1,L2,L3,L6,L2,L9,L0,LL

    it should not capture the line which has subline, thats all...



  • The following snippet should show you the basic principle:

    @
    QStringList l;
    l << "listOne (L1)";
    l << "listTwo (L2)";
    l << "listThree (L3)";
    l << "HeaderLine (HE)";
    l << "listFour (L6)";
    l << "listFive (L2)";
    l << "listSix (L9)";
    l << "listSeven (L0)";
    l << "someline (SL)";
    l << "listeight (LL)";

    QRegExp re("^.+\s+\((L[0-9L])\)$");
    foreach(const QString s, l) {
    qDebug() << "check string" << s;
    if(re.exactMatch(s)) {
    QString code = re.cap(1);
    qDebug() << " found mach" << code;
    } else {
    qDebug() << " no match";
    }
    }
    @

    Short explanation of the regex:

    • ^.+
      matches everything at the start of the string
    • \s+
      followed by at least one (or more) whitespace character(s) (space, tab, newlines)
    • \(
      followed by a literal opening parenthesis. Actually it is (, but the backslash needs to be encoded for C string construction
    • (
      start a caption group
    • L[0-9L]
      followd by a literal L and exactly one of 0, 1, 2... 9 or L
    • )
      end the caption gropu
    • \)
      followed by a literal closing parenthesis
    • $
      at the end of the string

    The caption group contains what has been matched in between, which will be one of L0, L1, L2... L9, LL.



  • Sorry Volker, not like that....

    All texts inside round bracket, which is present at the end of all line.
    And regular expression should not capture line which has sub line..
    example input:
    @
    -- afgh hkjhkh(gk_6)
    -- its main line (aa) <<--except this line capture remaining, as this has subline
    -- its sub line(bb) <<----subline
    -- its another subline(cc) <<-----subline
    -- something(dd09)
    -- this is also(tr_8787)@

    And output should be: gk_6,aa,bb,cc,dd09,tr_8787



  • Learn about regular expressions. Period.



  • It is up to you to detect what's a "subline" and skip the regex on that alltogether.

    I recommend to study the [[Doc:QString]] documentation. It has various helpful methods. Read through the method list and descriptions.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.