Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. I need an example of HTML parsing
QtWS25 Last Chance

I need an example of HTML parsing

Scheduled Pinned Locked Moved Unsolved General and Desktop
6 Posts 4 Posters 1.3k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T Offline
    T Offline
    txtsd
    wrote on last edited by txtsd
    #1

    I'm trying to parse HTML in my program and I'm using libxml since Qt5 does not have any HTML parsers.

    I need a simple example of how to fetch an element from an HTML document.

    So something like:

    QNetworkRequest request(QUrl("https://exam.ple/"));
    reply = qnam.get(request);
    ...
    QByteArray replyContent = reply->readAll();
    const char* data = replyContent.data();
    htmlDocPtr doc = htmlReadMemory(data, length, "https://exam.ple/", "UTF-8", HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR);
    

    And then I need to know how to actually request an element with a CSS selector like body > form > input[class="someclass"]

    JonBJ 1 Reply Last reply
    0
    • SGaistS Offline
      SGaistS Offline
      SGaist
      Lifetime Qt Champion
      wrote on last edited by
      #2

      Hi,

      @txtsd said in I need an example of HTML parsing:

      I'm trying to parse HTML in my program and I'm using libxml since Qt5 does not have any HTML parsers

      HTML parsers are usually called web browsers.

      Qt currently has XML parsing support.

      You might want to check the web engine module for more web related things like you want to do.

      Interested in AI ? www.idiap.ch
      Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

      1 Reply Last reply
      3
      • T Offline
        T Offline
        txtsd
        wrote on last edited by txtsd
        #3

        QtWebEngine is not what I need. That is a basically a web browser, not a parser with parsing tools.
        I use BeautifulSoup to parse HTML in python, and it uses lxml as a backend with is python bindings for libxml.
        I'm looking to similarly and easily parse HTML in C++/Qt.
        I just need to know how to use a CSS selector to fetch an element using libxml. There aren't many examples on the Internet.

        Qt XML is also not what I need since HTML is not XML.

        1 Reply Last reply
        0
        • T txtsd

          I'm trying to parse HTML in my program and I'm using libxml since Qt5 does not have any HTML parsers.

          I need a simple example of how to fetch an element from an HTML document.

          So something like:

          QNetworkRequest request(QUrl("https://exam.ple/"));
          reply = qnam.get(request);
          ...
          QByteArray replyContent = reply->readAll();
          const char* data = replyContent.data();
          htmlDocPtr doc = htmlReadMemory(data, length, "https://exam.ple/", "UTF-8", HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR);
          

          And then I need to know how to actually request an element with a CSS selector like body > form > input[class="someclass"]

          JonBJ Offline
          JonBJ Offline
          JonB
          wrote on last edited by JonB
          #4

          @txtsd said in I need an example of HTML parsing:

          I'm using libxml since Qt5 does not have any HTML parsers

          You'll be lucky if that works on HTML! It shouldn't, unless your HTML is XHTML, which it's unlikely to be.

          EDIT Having said that, despite its name, reading around I see that libxml2 is regarded as the "standard" HTML parser for HTML from C++, so i guess that is what you should use.

          T 1 Reply Last reply
          0
          • JonBJ JonB

            @txtsd said in I need an example of HTML parsing:

            I'm using libxml since Qt5 does not have any HTML parsers

            You'll be lucky if that works on HTML! It shouldn't, unless your HTML is XHTML, which it's unlikely to be.

            EDIT Having said that, despite its name, reading around I see that libxml2 is regarded as the "standard" HTML parser for HTML from C++, so i guess that is what you should use.

            T Offline
            T Offline
            txtsd
            wrote on last edited by
            #5

            @JonB It does! libxml has an HTMLParser! Which is what BeautifulSoup uses in python.

            1 Reply Last reply
            0
            • kkoehneK Offline
              kkoehneK Offline
              kkoehne
              Moderators
              wrote on last edited by
              #6

              @txtsd said in I need an example of HTML parsing:

              I just need to know how to use a CSS selector to fetch an element using libxml.

              This isn't really a Qt question, is it? I guess that might be best asked in a forum for libxml :)

              Director R&D, The Qt Company

              1 Reply Last reply
              2

              • Login

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • Users
              • Groups
              • Search
              • Get Qt Extensions
              • Unsolved