Multithreaded Webcrawler
-
Hi there,
I'm trying to build a multithreaded webcrawler which downloads a page's HTML and all resources it references. Given that QtWebPage isn't threadsafe, I'm wondering what would be the best way of accomplishing this?
Things I've tried:
- Have each thread start it's own QApplication, but that gives me a "There can only exist one QCoreApplication instance" error.
- Creating the QWebPage in the main (GUI) thread and moving it to a delegate thread, but that gives me a "QObject used from outside its own thread" error.
Any pointers/direction would be greatly appreciated!
-Arvind
-
Thanks for the reply! That was my initial approach which worked fine when I restricted the app to only one child thread but, on lifting this restriction, the app would segfault. Googling around led to these forum posts that suggested QtWebKit was not thread safe, and could only be instantiated in the main/GUI thread. Is this not right?
http://developer.qt.nokia.com/forums/viewthread/9035
http://developer.qt.nokia.com/forums/viewthread/3005 -
I guess its more like enter a website adresses like www.BigCompany.com and go through the page and all the pages it links to and for example collect all the e-mail adresses.