Problem about obtain text node with QWebElement



  • Hello there,

    I am trying to port a javascript library to c++ with QtWebkit. Although most of the parts worked, there is one specific task I cannot do with the QtWebkit.

    the task is, if I find a <div> tag in the current document tree, I add <p></p> to the inner text node of the <div> tag

    for example, if I see a piece of HTML code like this:

    @<div>hello there! how are you?<a href="xxx">xxx</a> I'm ok. </div>@

    I should output this:

    @<div><p>hello there! how are you?</p><a href="xxx">xxx</a><p> I'm ok.</p> </div>@

    But because QWebElement doesn't treat text nodes as web elements, I cannot see them from the QtWebkit document tree. So I cannot add "<p></p>" to enclose them.

    I tried to use more general xml parser, the QDomDocument and QDomNode. It works sometimes, but since this xml parser is more strict than the HTML parser, it fails when the inner HTML of the <div> tag has <br> or <img> or anything it considers as "tag miss match".

    now I don't know how to do this.

    thank you.



  • I am experiencing exactly the same problem. I am writing an HTML parser and I wanted to capture text like this:

    <p>seg1<span>seg2</span>seg3</p>

    I could get either "seg2" or "seg1<span>seg2</span>seg3"; but what should I do to capture text between QWebElement siblings such as "seg1" and "seg3"?



  • 2billconan
    My solution is to edit HTML as QString, and already the result to parsing
    @QString pTag(QString html)
    {
    QWebPage page;
    QString innerHtml;
    QWebElement div;
    page.mainFrame()->setHtml(html);
    div=page.mainFrame()->documentElement().findFirst("body").firstChild();
    innerHtml=div.toInnerXml();
    innerHtml.replace(QString("<"),QString("<<"),Qt::CaseInsensitive);
    innerHtml.replace(QString(">"),QString(">>"),Qt::CaseInsensitive);
    innerHtml.replace(QString("<<"),QString("< / p ><"),Qt::CaseInsensitive);
    innerHtml.replace(QString(">>"),QString("> < p >"),Qt::CaseInsensitive);
    innerHtml="< p >"+innerHtml+"< / p >";
    div.setInnerXml(innerHtml);
    return div.toOuterXml();
    }@
    dumb postparser does not write tags "< p >" without spaces, in the source erase any extra spaces



  • 2billconan
    Another solution would be to use CSS "Pseudo-elements":http://www.w3.org/TR/CSS2/selector.html#pseudo-element-selectors



  • @Pawelitel Why not directly replace "<" with "</p><"?

    Anyway, thanks a lot for your hints! Wish nokia would extend the current api in the near future.



  • I see why. what a clever trick! thx again! >_<



  • Hi, you could see href attribute with:
    QWebElement.attribute('href')
    that shows 'http://www.domain.com"


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.