Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

I need an example of HTML parsing



  • I'm trying to parse HTML in my program and I'm using libxml since Qt5 does not have any HTML parsers.

    I need a simple example of how to fetch an element from an HTML document.

    So something like:

    QNetworkRequest request(QUrl("https://exam.ple/"));
    reply = qnam.get(request);
    ...
    QByteArray replyContent = reply->readAll();
    const char* data = replyContent.data();
    htmlDocPtr doc = htmlReadMemory(data, length, "https://exam.ple/", "UTF-8", HTML_PARSE_NOBLANKS | HTML_PARSE_NOERROR);
    

    And then I need to know how to actually request an element with a CSS selector like body > form > input[class="someclass"]


  • Lifetime Qt Champion

    Hi,

    @txtsd said in I need an example of HTML parsing:

    I'm trying to parse HTML in my program and I'm using libxml since Qt5 does not have any HTML parsers

    HTML parsers are usually called web browsers.

    Qt currently has XML parsing support.

    You might want to check the web engine module for more web related things like you want to do.



  • QtWebEngine is not what I need. That is a basically a web browser, not a parser with parsing tools.
    I use BeautifulSoup to parse HTML in python, and it uses lxml as a backend with is python bindings for libxml.
    I'm looking to similarly and easily parse HTML in C++/Qt.
    I just need to know how to use a CSS selector to fetch an element using libxml. There aren't many examples on the Internet.

    Qt XML is also not what I need since HTML is not XML.



  • @txtsd said in I need an example of HTML parsing:

    I'm using libxml since Qt5 does not have any HTML parsers

    You'll be lucky if that works on HTML! It shouldn't, unless your HTML is XHTML, which it's unlikely to be.

    EDIT Having said that, despite its name, reading around I see that libxml2 is regarded as the "standard" HTML parser for HTML from C++, so i guess that is what you should use.



  • @JonB It does! libxml has an HTMLParser! Which is what BeautifulSoup uses in python.


  • Moderators

    @txtsd said in I need an example of HTML parsing:

    I just need to know how to use a CSS selector to fetch an element using libxml.

    This isn't really a Qt question, is it? I guess that might be best asked in a forum for libxml :)


Log in to reply