Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to get html source code from a web page?
Forum Updated to NodeBB v4.3 + New Features

How to get html source code from a web page?

Scheduled Pinned Locked Moved General and Desktop
4 Posts 3 Posters 8.3k Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    MathSquare
    wrote on last edited by
    #1

    How to get html source code from a web page? I want to get the html source code from a web page then fitter it for text. But how I would do that?

    1 Reply Last reply
    0
    • M Offline
      M Offline
      MuldeR
      wrote on last edited by
      #2

      Web-pages always come as source code. So just download the HTML file.

      You can use QNetworkAccessManager for this purpose, I think. Parsing the HTML code to "extract" the text will be the more difficult task here. Don't know how useful a QWebView can be for this task, but maybe you can construct a rather simple regular expression with QRegExp to get the text block out of the HTML code.

      For example, this would get everything enclosed by <body> ... </body> tags:
      @QString getText(const QString &theHtmlCode)
      {
      QRegExp filter("<body>(.+)</body>");
      int result = filer.indexIn(theHtmlCode);
      if(result != -1)
      {
      return filter.cap(1);
      }
      return QString();
      }@

      My OpenSource software at: http://muldersoft.com/

      Qt v4.8.6 MSVC 2013, static/shared: http://goo.gl/BXqhrS

      Go visit the coop: http://youtu.be/Jay...

      1 Reply Last reply
      0
      • M Offline
        M Offline
        MathSquare
        wrote on last edited by
        #3

        Doesn't know what is the "theHtmlCode'. Where do I put the URL in, and How?
        [quote author="MuldeR" date="1365379340"]Web-pages always come as source code. So just download the HTML file.

        You can use QNetworkAccessManager for this purpose, I think. Parsing the HTML code to "extract" the text will be the more difficult task here. Don't know how useful a QWebView can be for this task, but maybe you can construct a rather simple regular expression with QRegExp to get the text block out of the HTML code.

        For example, this would get everything enclosed by <body> ... </body> tags:
        @QString getText(const QString &theHtmlCode)
        {
        QRegExp filter("<body>(.+)</body>");
        int result = filer.indexIn(theHtmlCode);
        if(result != -1)
        {
        return filter.cap(1);
        }
        return QString();
        }@[/quote]

        1 Reply Last reply
        0
        • K Offline
          K Offline
          KA51O
          wrote on last edited by
          #4

          You can either use "QWebView":http://qt-project.org/doc/qt-4.8/qwebview.html or "QNetworkAcessManager":http://qt-project.org/doc/qt-4.8/qnetworkaccessmanager.html as already suggested by MuldeR. Have you even taken a look at the documentation of these classes? There are already examples of how to achieve what you are asking for in the documentation of these classes (3 lines of code are all it takes). Now all you need to do is read the doc and decide which one you want to use.

          1 Reply Last reply
          0

          • Login

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • Users
          • Groups
          • Search
          • Get Qt Extensions
          • Unsolved