Howto design a regular expression?



  • Hi everyone,

    this is my first contact with the Qt framework (so please be gentle ;) ). I've got a source code from an existing application on the Nokia N9 an try to fix a problem with input strings.

    The main term of the application is to get the textsource of html site and filter all hyper-links beginning with imgurl= and ending with ? - so I think the best way to handle this is a regular expression. But how does i design a regexp ? Or maybe there is a better solution?

    @
    QString line = in.readLine();
    QSet<QString> urls;

    if (!regExp.cap(1).isNull()) {

            QStringList imglist = line.split("imgurl=", QString::SkipEmptyParts);
            for (int i = 0; i &lt; imglist.size(); ++i) {
               //imglist.at(i).remove(!()) //Negotiate the RegExpression?
               urls.insert(imglist.at(i).toLatin1());
               qDebug() &lt;&lt; "[IMG] - URL: " << imglist.at(i);
            }
        }  
    

    @


  • Moderators

    welcome to devnet

    Did you have a look to the documentation of "QRegExp":http://qt-project.org/doc/qt-4.8/qregexp.html already?
    This provides you with the reference of using QRegExp and also some examples are included.



  • Yes - thank you for your help. The Problem is that i never worked with regular expressions.
    I found a python script (i think) that do the same:

    @
    imageMatches = @/(?i)/imgres?imgurl=(?<fullSize>[^&]+)&imgrefurl=(?<infoUri>[^&]+)[^>]+?&h=(?<height>\d+)&w=(?<width>\d+)[^>]+?&tbnid=(?<tbnid>[^&]+).+?</a><br>(?<title>.+?)<br>/.Matches(imagesHtml)
    @

    Does anyone know if it's portable to Qt?



  • Your example script uses named groups which I'm afraid aren't supported by QRegExp.

    If I understand your problem correctly, you want to find all occurrences of the string 'imgurl=' and capture the data that follows (terminated by '?')? If so, then the following example should be your answer:

    @
    #include <QtCore>
    #include <QDebug>

    int main(int argc, char *argv[])
    {
    QCoreApplication a(argc, argv);

    QString href("<a href='http://mysite/imgurl=xyz?a=1?b=2?c=3'></a>");
    
    QRegExp re("imgurl=([a-zA-Z0-9]+)");
    QStringList list;
    int pos=0;
    while((pos=re.indexIn(href,pos))!=-1){
        list << re.cap(1);
        pos+=re.matchedLength();
    }
    qDebug() << "Urls:" << list;
    

    }
    @

    The output from which is - Urls: ("xyz"). Hope this helps ;o)



  • Thanks man! That works for me =)

    But I have one little question
    I got this line:
    @
    51D+f3KItsL.SS500.jpg&imgrefurl=http://www.amazon.com/So-sehr-dabei-Edit/dp/images/B0084CBTOC&usg=__4EDtwSR84BqXe1YbDA-i0FewmJI=&h=500&w=500&sz=46&hl=de&start=1&zoom=1&tbnid=IAQVgTqbQEVOWM:&tbnh=130&tbnw=130&ei=sghMUfKVCcThtQbFkIGoAw&prev=@

    with a simple sed command @sed -r 's/([^.])../\1/'@ I get the needed result
    51D+f3KItsL

    But QRegExp is not sed right? Are there any docs howto extract a regexp from an sed command ?


  • Moderators

    [quote author="Lirion" date="1363939734"]
    But QRegExp is not sed right? Are there any docs howto extract a regexp from an sed command ?[/quote]

    Check out "this link.":http://qt-project.org/doc/qt-4.8/qregexp.html#introduction


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.