Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Web crawler
Forum Updated to NodeBB v4.3 + New Features

Web crawler

Scheduled Pinned Locked Moved General and Desktop
3 Posts 3 Posters 2.3k Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B Offline
    B Offline
    beowulf
    wrote on last edited by
    #1

    How to create a web crawler?

    Example:

    The application search on google, handle all listed sites and access one by one and get something.

    -- 0x00

    1 Reply Last reply
    0
    • B Offline
      B Offline
      beemaster
      wrote on last edited by
      #2

      You might want to start with "QNetworkAccessManager":http://qt-project.org/doc/qt-4.8/QNetworkAccessManager.html
      @
      QNetworkAccessManager manager;
      QString searchWord = "Hello";
      QString request = "http://www.google.com.ua/#hl=en&output=search&q=" + searchWord;
      manager.get(QNetworkRequest(QUrl(request));
      @
      After this you have to parse the result. That's the most difficult.

      1 Reply Last reply
      0
      • C Offline
        C Offline
        codenode
        wrote on last edited by
        #3

        I've done some research on this in January.

        First, QNetworkAccessManager is no solution, as it seems, as its a good HTTP source.
        But, you have to put the received content in a browser like enviroment, also parsing HTML is not really trivial, there is a tagsoup implementation which would do, but you got the problem, that some links are generated through javascript, so you really need to put that in a browser like thing -> QtWebKit.

        QtWebKit offers a lot of good stuff which you can use to crawl, f.e. it can extract all <a> tags (aka links).
        But, the problem here is, QtWebKit is not threadsafe, so you'd have to handle multiple Processes doing the work, in order to speed up the process.

        1 Reply Last reply
        0

        • Login

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • Users
        • Groups
        • Search
        • Get Qt Extensions
        • Unsolved