[SOLVED]How to parse html in qt



  • As the title said, in QT GUI application.
    should i use regex?
    if yes how i can use this in qt regex?
    @<div\sclass="result-box">(.*)</div></div><div\sclass="ad-page-S1"\sid="DA_03-0-4"></div>@


  • Moderators

    depending on the use case regular expressions are sub-optimal for parsing html.
    What information do you exactly want? Propably the best would be to load the file using QWebPage and get the contents of it using the QWebElement API.

    Also you can read "this":http://qt-project.org/wiki/Handling_HTML.



  • i have added webkitwidgets in pro file.
    #include <QWebElement>
    #include <QWebFrame>
    in mainwindow.cpp file
    if i try the following
    @
    QWebFrame *frame;
    frame->setHtml("<html></html>");
    QWebElement parse = frame->documentElement();
    @

    its will crash:The program has unexpectedly finished.


  • Moderators

    and what happens when you use QWebPage instead of QWebFrame (as is said)?



  • Same thing
    [quote author="raven-worx" date="1372856223"]and what happens when you use QWebPage instead of QWebFrame (as is said)?[/quote]

    @#include <QWebPage>
    QString source(reply->readAll());
    QWebPage *page;
    page->mainFrame()->setHtml(source);@

    The program has unexpectedly finished.


  • Moderators

    -can you run your application in your IDE and in debug mode and post the stack trace of the crash.-

    nvm...do this please:
    @
    QWebPage *page = new QWebPage;
    page->mainFrame()->setHtml(source);
    @
    or this
    @
    QWebPage page;
    page.mainFrame()->setHtml(source);
    @



  • 2nd one works.
    @ QString source(reply->readAll());
    QWebPage page;
    page.mainFrame()->setHtml(source);
    QWebElement parse = page.mainFrame()->documentElement();
    QWebElement result = parse.findFirst("div[class=result-box]");
    ui->plainTextEdit->setPlainText(result.toPlainText());@
    the last line cause crash
    !http://s16.postimg.org/6dgypgt9h/Screenshot_from_2013_07_03_16_40_27.png(1)!
    !http://s22.postimg.org/j4qwk68wx/Screenshot_from_2013_07_03_16_41_36.png(2)!


  • Moderators

    when working with QWebElement it's good practice to check if the element is null: QWebElement::isNull() ... please check that.
    This happens when the search was unsuccessful for example.



  • @QMessageBox::information(this, "T", QString::number(result.isNull()));@
    this breaks the code too


  • Moderators

    i just tried your code and it is working for me.



  • i tried to put <html></html> and its worked, i think the html source i got who make the crash?


  • Moderators

    seems so...maybe this is a bug in the Qt implementation.
    maybe you can find out what exactly causes the crash? You got the full source code from the network reply. You can save the string and use it for testing which part exactly causes the crash.

    Did you also try loading another url? Does it only occur on a specific webpage?



  • i have downloaded the latest release qt 5.1.0 and the crash stopped.



  • thanks for your helping.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.