QWebFrame::findAllElements() doesn't work in separate thread



  • hi. i'm trying to parse webPage to get all links on it. i use a QThreadPool and my own class implementing QRunnable, which provides all fucntionality. Everything is going good, if I run such QRunnable in main thread, but
    when I put my Runnable to the thread pool, QWebFrame::findAllElements("a") returns empty set. =(

    I run my Runnable:
    in same thread, findAllElements() works perfect
    @webLoader()->run();@

    findAllElements() returns empty set =(
    @threadPool->start(webLoader);@

    I've already spent many hours trying to figure out problem. =(

    Here is my code snippets:

    webloader.h
    @class WebLoader : public QObject, public QRunnable
    {
    Q_OBJECT
    QUrl url;
    QString textPattern;
    QWebPage* page;
    public:
    explicit WebLoader(const QUrl& url, const QString& textPattern);
    void run();
    virtual ~WebLoader();
    signals:
    void loaded(QList<QUrl> urls, bool error, bool found);
    private slots:
    void loadFinished(bool success);
    };@

    webloader.cpp
    @WebLoader::WebLoader(const QUrl& url, const QString& textPattern): url(url), textPattern(textPattern) {
    setAutoDelete(false);
    }

    void WebLoader::run() {
    QEventLoop loop;

    page = new QWebPage;
    page->mainFrame()->load(url);
    
    connect(page->mainFrame(), SIGNAL(loadFinished(bool)), SLOT(loadFinished(bool)));
    connect(page->mainFrame(), SIGNAL(loadFinished(bool)), &loop, SLOT(quit()));
    
    loop.exec&#40;&#41;;
    

    }

    void WebLoader::loadFinished(bool success) {
    QList<QUrl> urlList;
    bool found = false;

    if(success) {
        QWebElementCollection collection = page->mainFrame()->findAllElements("a");
    
        foreach(QWebElement element, collection) {
            if(element.hasAttribute("href")) {
                urlList.push_back(url.resolved(QUrl(element.attribute("href"))));
            }
        }
    
        found = page->findText(textPattern);
    }
    
    emit loaded(urlList, !success, found);
    

    }@

    PS. sorry for my english.



  • That is not possible with current Webkit 2, alas Webkit 3 is QML only and is missing a ton of useful features

    Webkit 2 creates and uses QWidgets so all access to Qt webkit should be done on main thread. This is by design.
    What you want should be possible with Webkit 3 alas we are still waiting for a decent C++ API
    http://qt-project.org/forums/viewthread/26585



  • Thanks a lot for your answer. It's sadly, webkit could be a great tool for html-parsing.
    Finally, I've used regexps to cut out "<a ></a>" fragments from html-code.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.