Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

QNetworkAccessManager: download page resources



  • Hi, Qt Project.

    I had a task to download the whole webpage and store it on the computer. So, I need to download page itself and all its resources (css, img, js).

    @
    image.png
    style.css
    common.js
    @

    Questions:

    How can I download the whole webpage with all its resources

    What's the best way to impl some kind of cache-manager (my task), which will save page and its resources to be run again in future.

    Thanks!



  • Check the similar "thread with ideas how to download a whole web site":http://qt-project.org/forums/viewthread/20957 using QNetworkAccessManager with the assistance of QUrl.



  • You have to find pieces of html code starting with href=" or src='. I used to cut the code by these(in both cases can be " or ' behind = so I split it by href= and src= and removed 1st char), cut off unneeded part of code after next ' or " (as http://example.com/style.css">some other code...) char in code and select addresses ending with regexp I actually need, such as .css,.png etc.
    Some links can look like "/images/blahblah.png" so you need to select them (easily url.toString.startsWith("/")) and add the url you downloaded it from (for example "http://example.com"+"/images/blahblah.png").
    And don't forget create folder which you should save it to, for example for blahblah.png it is /images in directory you are saving it all to.
    Hope this helps :)



  • I'd not try to parse the HTML myself. What if some javascript is used to load additional content?

    Instead, I'd just use QWebPage with a custom QNetworkAccessManager that simply saves all resources downloaded.


  • Moderators

    To download a website with all the resources use QNAM::setCache to set a QNetworkDiskCache with your preffered directory to store data.

    After you download the page, you can do this on the QNetworkRequest to make an "offline" request:
    @
    QNetworkRequest rq(QUrl("http://whatever.url"));
    rq.setAttribute(QNetworkRequest::CacheLoadControlAttribute, QNetworkRequest::AlwaysCache);
    @



  • I have the same question. I want to download all the resources of a webpage (css, js, image) by loading the page in QWebPage.

    The problem that I have is that read() in QNetworkReply is sequential and after QWebPage uses read() for its own rendering, my program gets nothing to read (and then to save to a file).

    I have seen a few posts suggesting that we should use a custom QNetworkAccessManager and a custom QNetworkReply, but I'm new to Qt and don't know exactly how to do this. I would appreciate it if you can give a little bit more information about this. If you have any sample code for this, that would be great too.


Log in to reply