QRegularExpression need help with patern



  • Hello all,

    I just can't get it right.. maybe someone can help me with it?

    I have QByteArray with list of elements in it and looks like that:

    @
    <!--story_23015_start--> test test 234 2tsdtsdf sf sdfds <!--story_23015_end-->
    ......
    <!--story_2301275_start--> tsdtsdf sf sdfds <!--story_2301275_end-->
    @

    how can I extract numbers between "<!--story_" & "_start-->" and text in the midle from each item using QRegularExpression?

    I will really appreciate your help!



  • The following pattern works (RegExp Example)

    @<!--story_(\d+)_start-->(.+)<!--story.+-->$@



  • Thank you very much mcosta

    The problem is: items are not one per line and they can have text or any chars at the beginning or at the end. Like:
    @
    asdmpowq d34 23 <!--story_23015_start--> test test 234 2tsdtsdf sf sdfds <!--story_23015_end--> html tags etc
    <!--story_2301275_start--> tsdtsdf sf sdfds <!--story_2301275_end-->
    @

    I have tried this way, but I just whole items as one item and not the list.

    @
    QRegExp rx("(.+)<!--story_(\d+)_start-->(.+)<!--story.+-->(.+)");
    int pos = 0;
    while((pos = rx.indexIn(data,pos)) != -1)
    {
    qDebug() << rx.cap(2) << pos;
    pos += rx.matchedLength();
    }
    @



  • Hi,

    you could try to do this:

    • remove all newline
    • extract the signle tag
    • applu the regular expression


  • Hi,

    You can use an expression as follow:

    @
    <!--story_(\d+)start-->(.+)(?=<!--story(\d+)_end-->)
    @

    But I am not sure that the positive lookahead (?=) works on QRegExp, it should work according to the docs.

    According to the docs the .(dot) should match newlines.

    Hint: I normally use the following website to test my regular expressions, it works quite well: http://www.regexr.com/



  • Another interesting option is (sorry just escape for usage):

    @
    <!--story_(\d+)start-->([\s\S]+)<!--story(\1)_end-->
    @

    This can be used to make sure that the story number matches in the start and end using a "backreference":http://qt-project.org/doc/qt-4.8/qregexp.html#backreferences to the number captured in the start.

    The [\s\S] is used if the .(dot) does not match a new line.

    If you don't want to capture the number in the end tag you can use the (?:\1) tag to make a non-capturing group for that number.

    Hope this helps a bit


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.