Qt Forum

    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Unsolved

    Qt Academy Launch in California!

    Unsolved I need an example of HTML parsing

    General and Desktop
    4
    6
    364
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T
      txtsd last edited by txtsd

      I'm trying to parse HTML in my program and I'm using libxml since Qt5 does not have any HTML parsers.

      I need a simple example of how to fetch an element from an HTML document.

      So something like:

      QNetworkRequest request(QUrl("https://exam.ple/"));
      reply = qnam.get(request);
      ...
      QByteArray replyContent = reply->readAll();
      const char* data = replyContent.data();
      htmlDocPtr doc = htmlReadMemory(data, length, "https://exam.ple/", "UTF-8", HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR);
      

      And then I need to know how to actually request an element with a CSS selector like body > form > input[class="someclass"]

      JonB 1 Reply Last reply Reply Quote 0
      • SGaist
        SGaist Lifetime Qt Champion last edited by

        Hi,

        @txtsd said in I need an example of HTML parsing:

        I'm trying to parse HTML in my program and I'm using libxml since Qt5 does not have any HTML parsers

        HTML parsers are usually called web browsers.

        Qt currently has XML parsing support.

        You might want to check the web engine module for more web related things like you want to do.

        Interested in AI ? www.idiap.ch
        Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

        1 Reply Last reply Reply Quote 3
        • T
          txtsd last edited by txtsd

          QtWebEngine is not what I need. That is a basically a web browser, not a parser with parsing tools.
          I use BeautifulSoup to parse HTML in python, and it uses lxml as a backend with is python bindings for libxml.
          I'm looking to similarly and easily parse HTML in C++/Qt.
          I just need to know how to use a CSS selector to fetch an element using libxml. There aren't many examples on the Internet.

          Qt XML is also not what I need since HTML is not XML.

          1 Reply Last reply Reply Quote 0
          • JonB
            JonB @txtsd last edited by JonB

            @txtsd said in I need an example of HTML parsing:

            I'm using libxml since Qt5 does not have any HTML parsers

            You'll be lucky if that works on HTML! It shouldn't, unless your HTML is XHTML, which it's unlikely to be.

            EDIT Having said that, despite its name, reading around I see that libxml2 is regarded as the "standard" HTML parser for HTML from C++, so i guess that is what you should use.

            T 1 Reply Last reply Reply Quote 0
            • T
              txtsd @JonB last edited by

              @JonB It does! libxml has an HTMLParser! Which is what BeautifulSoup uses in python.

              1 Reply Last reply Reply Quote 0
              • kkoehne
                kkoehne Moderators last edited by

                @txtsd said in I need an example of HTML parsing:

                I just need to know how to use a CSS selector to fetch an element using libxml.

                This isn't really a Qt question, is it? I guess that might be best asked in a forum for libxml :)

                Director R&D, The Qt Company

                1 Reply Last reply Reply Quote 2
                • First post
                  Last post