Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. Qt WebKit
  4. Multithreaded Webcrawler
Forum Updated to NodeBB v4.3 + New Features

Multithreaded Webcrawler

Scheduled Pinned Locked Moved Qt WebKit
5 Posts 4 Posters 4.3k Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    arvind2111
    wrote on last edited by
    #1

    Hi there,

    I'm trying to build a multithreaded webcrawler which downloads a page's HTML and all resources it references. Given that QtWebPage isn't threadsafe, I'm wondering what would be the best way of accomplishing this?

    Things I've tried:

    • Have each thread start it's own QApplication, but that gives me a "There can only exist one QCoreApplication instance" error.
    • Creating the QWebPage in the main (GUI) thread and moving it to a delegate thread, but that gives me a "QObject used from outside its own thread" error.

    Any pointers/direction would be greatly appreciated!

    -Arvind

    1 Reply Last reply
    0
    • G Offline
      G Offline
      giesbert
      wrote on last edited by
      #2

      Hi,

      Q(Core)Application is a process singleton. It may only exist once in a process, and should be instantiated inside main()

      Why didn't you create the QWebPage inside the run method of the thread?

      Nokia Certified Qt Specialist.
      Programming Is Like Sex: One mistake and you have to support it for the rest of your life. (Michael Sinz)

      1 Reply Last reply
      0
      • A Offline
        A Offline
        arvind2111
        wrote on last edited by
        #3

        Thanks for the reply! That was my initial approach which worked fine when I restricted the app to only one child thread but, on lifting this restriction, the app would segfault. Googling around led to these forum posts that suggested QtWebKit was not thread safe, and could only be instantiated in the main/GUI thread. Is this not right?

        http://developer.qt.nokia.com/forums/viewthread/9035
        http://developer.qt.nokia.com/forums/viewthread/3005

        1 Reply Last reply
        0
        • G Offline
          G Offline
          goetz
          wrote on last edited by
          #4

          Using QWebKit for a web crawler sounds quite overdosed for that task. At least for the definition of "web crawler" (cf. wget -r) that I usually have...

          http://www.catb.org/~esr/faqs/smart-questions.html

          1 Reply Last reply
          0
          • K Offline
            K Offline
            KA51O
            wrote on last edited by
            #5

            I guess its more like enter a website adresses like www.BigCompany.com and go through the page and all the pages it links to and for example collect all the e-mail adresses.

            1 Reply Last reply
            0

            • Login

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • Users
            • Groups
            • Search
            • Get Qt Extensions
            • Unsolved