[SOLVED]QRegExp is too greedy...



  • Hello
    I have a QString, "data" that contains the source of a web page. Included in this is the following line:

    @<iframe width="640" height="360" src="http://www.mysite.com/embed/4Ieelwhw" frameborder="0" allowfullscreen></iframe>@

    I'm trying to extract the path that the src attribute equals to using a regexp. What I'm doing is the following:

    @QRegExp regx("<iframe .src="(.)(".></iframe>)"); //<b>[^<]</b>
    regx.setMinimal(true);
    regx.indexIn(strType);

        qDebug() << strType.mid(regx.pos(1),regx.pos(2)-regx.pos(1));@
    

    My problem is that the regex is too "greedy" though the setMinimal is set to true... The output in the debug output is:

    "http://www.mysite.com/embed/4Ieelwhw" frameborder="0"

    How can I make it stop at the first " and not include the frameborder? Thanks

    -RS



  • I think the '?' character in your regex disables the greedy mode.


  • Moderators

    !I have not tested this!

    You could also try: "src="([^"]*)".

    That is src=" followed by any character that is not a ".



  • Unfortunately neither suggestion helped...

    I tried:

    @QRegExp regx("<iframe .src="([^\”])(".*></iframe>)");
    regx.setMinimal(true);
    regx.indexIn(strType);

        qDebug() << strType.mid(regx.pos(1),regx.pos(2)-regx.pos(1));@
    

    But the output is the same:

    http://www.mysite.com/embed/4Ieelwhw” frameborder=“0”

    I also tried:
    @QRegExp regx("<iframe .src="(.?)(".></iframe>)");
    QRegExp regx("<iframe .src="(.?)(".
    ></iframe>)");@
    But neither matched anything...

    Any further suggestions? Thanks!
    -RS



  • ok, found the problem (when I posted my reply) Apparently there was a copy paste issue: [^\”] vs [^"]*.

    it now works with:

    @QRegExp regx("<iframe .src="([^"])(".*></iframe>)");@

    Thank you!
    -RS



  • I would rather write (don't know whether it works though):

    QRegExp regx("&lt;iframe .*src=\"(.+?)(\".*&gt;&lt;/iframe>)");
    

    Note the + instead of your *. + stands for at least one character and maybe more, whilst * stands for zero or one character.

    [quote author="ThaRez" date="1340956318"]I also tried:
    @QRegExp regx("<iframe .src="(.?)(".></iframe>)");
    QRegExp regx("<iframe .src="(.?)(".
    ></iframe>)");@
    But neither matched anything...

    Any further suggestions? Thanks!
    -RS

    [/quote]


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.