Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to scroll QWebEnginePage efficiently?
QtWS25 Last Chance

How to scroll QWebEnginePage efficiently?

Scheduled Pinned Locked Moved Unsolved General and Desktop
2 Posts 1 Posters 1.3k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • thamT Offline
    thamT Offline
    tham
    wrote on last edited by tham
    #1

    I am trying to scroll the search results of Bing image search to the end, cannot find a working solution yet.

    bing_image_search::bing_image_search(QObject *parent) : //initialize....
    {
        auto *web_page = &get_web_page();
        connect(web_page, &QWebEnginePage::scrollPositionChanged,
                this, &bing_image_search::web_page_scroll_position_changed);
    }
      
    void bing_image_search::parse_page_link(QPointF const &point)
    {
        if(state_ != state::parse_page_link){
            return;
        }
    
        get_web_page().toHtml([this, point](QString const &contents)
        {
            qDebug()<<"get image link contents";
            QRegularExpression reg("(search\\?view=detailV2[^\"]*)");
            auto iter = reg.globalMatch(contents);
            QStringList links;
            while(iter.hasNext()){
                QRegularExpressionMatch match = iter.next();
                if(match.captured(1).right(20) != "ipm=vs#enterinsights"){
                    QString url = QUrl("https://www.bing.com/images/" + match.captured(1)).toString();
                    url.replace("&amp;", "&");
                    links.push_back(url);
                }
            }
            links.removeDuplicates();
            qDebug()<<"total match link:"<<links.size();
            if(links.size() > img_page_links_.size()){
                links.swap(img_page_links_);
            }
            if((size_t)img_page_links_.size() >= max_search_size_){
                state_ = state::parse_img_link;
            }else{
                get_web_page().findText("See more images", QWebEnginePage::FindFlag(), [this](bool found)
                {
                    if(found){
                        qDebug()<<"found See more images";
                        get_web_page().runJavaScript("document.getElementsByClassName(\"btn_seemore\")[0].click();"
                                                     "window.scrollTo(0, document.body.scrollHeight);");
                    }else{
                        qDebug()<<"cannot found See more images";
                        get_web_page().runJavaScript(js_scroll_to_window_height(1000), [this](QVariant const &result)
                        {
                            qDebug()<<"scroll page result:"<<result;
                            if(!result.toList()[0].toBool()){
                                state_ = state::parse_img_link;
                                parse_imgs_link();
                            }
                        });
                    }
                });
            }
        });
    }
    
    void bing_image_search::scroll_web_page(QPointF const &point)
    {
        //we need to setup timer if the web view are shown on the screen.
        //Because web view may not able to update in time, this may cause the signal scrollPositionChanged
        //never emit since the web page do not have enough of space to scroll down..
       //TODO : fix this poor solution
        QTimer::singleShot(1000, [=]()
        {
            if(state_ == state::parse_page_link){
                parse_page_link(point);
            }
        });
    }
      
    void bing_image_search::web_page_scroll_position_changed(const QPointF &point)
    {
        static size_t index = 0;
        qDebug()<<index++<<":"<<point.y();
        scroll_web_page(point);
    }
    

    java script of "js_scroll_to_window_height"

    namespace{
    
    QString doc_height()
    {
        return QString(
                    "function doc_height(){"
                    "  return Math.max("
                    "    document.body.scrollHeight, document.documentElement.scrollHeight,"
                    "    document.body.offsetHeight, document.documentElement.offsetHeight,"
                    "    document.body.clientHeight, document.documentElement.clientHeight);"
                    "}"
                    );
    }
    
    }
    
    QString js_scroll_to_window_height(qreal offset)
    {
        return doc_height() + QString("\n"
                    "var dheight = doc_height();"
                    "function scrollPage(){"
                    "  var cur_height = window.innerHeight + window.pageYOffset;"
                    "  if(Math.abs(window.pageYOffset - document.body.scrollHeight) < %1){"
                    "    return [false, cur_height, dheight];"
                    "  }else{"
                    "    window.scrollTo(0, window.pageYOffset + %1);"
                    "    return [true, cur_height, dheight];"
                    "  }"
                    "}"
                    "scrollPage()").arg(offset);
    }
    

    I got two problems

    1 : I cannot find a better way to scroll down the web page without the help if timer(function "scroll_web_page"), do I have a better way to scroll page?
    2 : I give the solution of stack overflow a shot(detect if browser window scroll to bottom), but none of them work as expected, I do some alternate on it, but this solution depend on luck a lots, sometime it can detect, sometime cannot.

    ps : scroll page would not emit loadFInished signal

    1 Reply Last reply
    0
    • thamT Offline
      thamT Offline
      tham
      wrote on last edited by tham
      #2

      One of the problem is, after I scroll the page, the height of the scroll bar may change, web view need times to reflect the change, if I scroll the page too fast, page scrolling action may end too early. No matter what I tried, I have to rely on timer and tune some parameters for specific search engine(google, bing, flickr), is this normal for web scraping, or I did something wrong(I hope I am wrong because I do not like to change parameters here and there)?Thanks

      1 Reply Last reply
      0

      • Login

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • Users
      • Groups
      • Search
      • Get Qt Extensions
      • Unsolved