Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Parsing page
QtWS25 Last Chance

Parsing page

Scheduled Pinned Locked Moved General and Desktop
4 Posts 2 Posters 2.1k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T Offline
    T Offline
    ThaRez
    wrote on last edited by
    #1

    Hello
    I would like to load a website and parse it similar to a XML document, creating a tree structure. There can be standard requirements on the page, as XHTML 1.0 Strict. I've tried the following:

    @ui->webView->load(QUrl("http://validator.w3.org/"));
    QDomDocument xmlDoc;
    if( !xmlDoc.setContent( ui->webView->page()->currentFrame()->toHtml() ) )
    {
    qDebug("ERROR");
    }
    else
    {
    qDebug() << "OK";
    }@

    But though the http://validator.w3.org/ is valid xhtml, it still won't parse as a XML file. Is there any good options to this?
    Thanks
    Richard

    1 Reply Last reply
    0
    • K Offline
      K Offline
      KA51O
      wrote on last edited by
      #2

      "QWebElement":https://qt-project.org/doc/qt-4.8/qwebelement.html will be your friend.
      To get the 'root' use @webView->page()->mainFrame()->documentElement();@
      Then you can walk through it using methods like QWebElement::firstChild(), QWebElement::lastChild(), QWebElement::nextSibling(), ...

      I used something like the following to find the element that has focus. You can adapt it for your use case.
      @
      // for example hand over the root element as the parameter
      QWebElement WebViewDerivedClass::findElementWithFocus(const QWebElement& a_element)
      {
      QWebElement result;
      QWebElement tempWebElement = a_element.firstChild();
      bool done = false;
      while(!done)
      {
      if(tempWebElement == a_element.lastChild())
      {
      done = true;
      }

        if(tempWebElement.hasFocus())
        {
           return tempWebElement;
        }
        if(!tempWebElement.firstChild().isNull())
        {
           QWebElement tempWebElement2 = findElementWithFocus(tempWebElement);
           if(!tempWebElement2.isNull())
           {
              return tempWebElement2;
           }
        }
        tempWebElement = tempWebElement.nextSibling();
      

      }
      return result;
      }
      @

      1 Reply Last reply
      0
      • T Offline
        T Offline
        ThaRez
        wrote on last edited by
        #3

        Great. Still, I'm wondering why the Webkit modifies the source code? When I read the source from a file and print it to the debug, it looks ok. But as soon as I set it to the webview and read it from there (toHtml) it changes all the "/>" to ">" meaning it'll lack the closing part... Can I prevent this? It's a problem as eg normally meta tags are closed directly. Thanks
        Richard

        1 Reply Last reply
        0
        • K Offline
          K Offline
          KA51O
          wrote on last edited by
          #4

          Yeah I noticed that too. Don't know how to stop QtWebKit from doing so. You could try to use QDomDocument for parsing your HTML stuff again.

          1 Reply Last reply
          0

          • Login

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • Users
          • Groups
          • Search
          • Get Qt Extensions
          • Unsolved