QNetworkReply Just download the head of the Url file



  • Hello,
    See below a part of the code which i want to make advanced

    void Form::testNum (QString num)
    {
        qDebug() << num;
    
        manager = new QNetworkAccessManager(this);
    
        QNetworkRequest request;
        request.setUrl(QUrl("http://www.xxxxxx.com/"+num));
    
        response = manager->get(request);
    
        connect(response, &QNetworkReply::finished,this,&Form::getResult );
    }
    
    void Form::getResult() {
        qDebug() << "Finished";
        qDebug() << response->readAll();
        response->deleteLater();
        response = Q_NULLPTR;
    }
    
    

    I saw the way to store it in a QByteArray to make an analysis.
    But do you think is possible to get a part of the html code like the <head> without downloading all the file?

    Thanks for reading.
    Bye.



  • void Form::getResult() {
        qDebug() << "Finished";
        //qDebug() << response->readAll();
        QByteArray output = response->readAll();
        QString tmp = output;
        QStringList list;
        list = tmp.split(QRegExp("<head>"));
        tmp=list[1];
        list=tmp.split("</head>");
        tmp=list[0];
        qDebug() << "Head is : " + tmp;
    
        response->deleteLater();
        response = Q_NULLPTR;
    }
    

    Easy to write but it's not my point of view.


  • Moderators

    @An-other-french said in QNetworkReply Just download the head of the Url file:

    But do you think is possible to get a part of the html code like the <head> without downloading all the file?

    your example still downloads the whole file.

    Just downloading selectively a part of a HTML content isn't possible.
    Instead of connecting to the QNetworkReply::finished signal you can connect to the readyRead signal and save the response's data in a buffer until you find </head>.

    void Form::getResult() {
        m_Buffer += response->readAll();
        
       int found = m_Buffer.indexOf("</head>");
        if( found >= 0 )
       {
             response->deleteLater(); // implicitly aborts the request if not yet finished
             response = Q_NULLPTR;
       }
    }
    

    But also note, that not each HTML page is required to have a <head> tag.



  • Thanks.
    Changing the signal is smart.

    In another way, i look for a protocol like "robot" which just catching the summary of the internet page.
    It'll be construct with the metadatas built with the <head> of the page ( get_meta_tags in php ).

    Perhaps this.


  • Moderators

    @An-other-french said in QNetworkReply Just download the head of the Url file:

    Perhaps this.

    but those are HTTP headers and have nothing todo with HTML headers.
    If you are just interested in the HTTP-Headers then use QNetworkAccessManager::head() (instead of get()) and the content-body is not transferred.

    HTML-headers are contained in the data-body und thus need to be requested via get() - and be read with the approach i've posted.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.