Howto design a regular expression?
-
Hi everyone,
this is my first contact with the Qt framework (so please be gentle ;) ). I've got a source code from an existing application on the Nokia N9 an try to fix a problem with input strings.
The main term of the application is to get the textsource of html site and filter all hyper-links beginning with imgurl= and ending with ? - so I think the best way to handle this is a regular expression. But how does i design a regexp ? Or maybe there is a better solution?
@
QString line = in.readLine();
QSet<QString> urls;if (!regExp.cap(1).isNull()) {
QStringList imglist = line.split("imgurl=", QString::SkipEmptyParts); for (int i = 0; i < imglist.size(); ++i) { //imglist.at(i).remove(!()) //Negotiate the RegExpression? urls.insert(imglist.at(i).toLatin1()); qDebug() << "[IMG] - URL: " << imglist.at(i); } }
@
-
welcome to devnet
Did you have a look to the documentation of "QRegExp":http://qt-project.org/doc/qt-4.8/qregexp.html already?
This provides you with the reference of using QRegExp and also some examples are included. -
Yes - thank you for your help. The Problem is that i never worked with regular expressions.
I found a python script (i think) that do the same:@
imageMatches = @/(?i)/imgres?imgurl=(?<fullSize>[^&]+)&imgrefurl=(?<infoUri>[^&]+)[^>]+?&h=(?<height>\d+)&w=(?<width>\d+)[^>]+?&tbnid=(?<tbnid>[^&]+).+?</a><br>(?<title>.+?)<br>/.Matches(imagesHtml)
@Does anyone know if it's portable to Qt?
-
Your example script uses named groups which I'm afraid aren't supported by QRegExp.
If I understand your problem correctly, you want to find all occurrences of the string 'imgurl=' and capture the data that follows (terminated by '?')? If so, then the following example should be your answer:
@
#include <QtCore>
#include <QDebug>int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);QString href("<a href='http://mysite/imgurl=xyz?a=1?b=2?c=3'></a>"); QRegExp re("imgurl=([a-zA-Z0-9]+)"); QStringList list; int pos=0; while((pos=re.indexIn(href,pos))!=-1){ list << re.cap(1); pos+=re.matchedLength(); } qDebug() << "Urls:" << list;
}
@The output from which is - Urls: ("xyz"). Hope this helps ;o)
-
Thanks man! That works for me =)
But I have one little question
I got this line:
@
51D+f3KItsL.SS500.jpg&imgrefurl=http://www.amazon.com/So-sehr-dabei-Edit/dp/images/B0084CBTOC&usg=__4EDtwSR84BqXe1YbDA-i0FewmJI=&h=500&w=500&sz=46&hl=de&start=1&zoom=1&tbnid=IAQVgTqbQEVOWM:&tbnh=130&tbnw=130&ei=sghMUfKVCcThtQbFkIGoAw&prev=@with a simple sed command @sed -r 's/([^.])../\1/'@ I get the needed result
51D+f3KItsLBut QRegExp is not sed right? Are there any docs howto extract a regexp from an sed command ?
-
[quote author="Lirion" date="1363939734"]
But QRegExp is not sed right? Are there any docs howto extract a regexp from an sed command ?[/quote]Check out "this link.":http://qt-project.org/doc/qt-4.8/qregexp.html#introduction