How to scroll QWebEnginePage efficiently?
-
I am trying to scroll the search results of Bing image search to the end, cannot find a working solution yet.
bing_image_search::bing_image_search(QObject *parent) : //initialize.... { auto *web_page = &get_web_page(); connect(web_page, &QWebEnginePage::scrollPositionChanged, this, &bing_image_search::web_page_scroll_position_changed); } void bing_image_search::parse_page_link(QPointF const &point) { if(state_ != state::parse_page_link){ return; } get_web_page().toHtml([this, point](QString const &contents) { qDebug()<<"get image link contents"; QRegularExpression reg("(search\\?view=detailV2[^\"]*)"); auto iter = reg.globalMatch(contents); QStringList links; while(iter.hasNext()){ QRegularExpressionMatch match = iter.next(); if(match.captured(1).right(20) != "ipm=vs#enterinsights"){ QString url = QUrl("https://www.bing.com/images/" + match.captured(1)).toString(); url.replace("&", "&"); links.push_back(url); } } links.removeDuplicates(); qDebug()<<"total match link:"<<links.size(); if(links.size() > img_page_links_.size()){ links.swap(img_page_links_); } if((size_t)img_page_links_.size() >= max_search_size_){ state_ = state::parse_img_link; }else{ get_web_page().findText("See more images", QWebEnginePage::FindFlag(), [this](bool found) { if(found){ qDebug()<<"found See more images"; get_web_page().runJavaScript("document.getElementsByClassName(\"btn_seemore\")[0].click();" "window.scrollTo(0, document.body.scrollHeight);"); }else{ qDebug()<<"cannot found See more images"; get_web_page().runJavaScript(js_scroll_to_window_height(1000), [this](QVariant const &result) { qDebug()<<"scroll page result:"<<result; if(!result.toList()[0].toBool()){ state_ = state::parse_img_link; parse_imgs_link(); } }); } }); } }); } void bing_image_search::scroll_web_page(QPointF const &point) { //we need to setup timer if the web view are shown on the screen. //Because web view may not able to update in time, this may cause the signal scrollPositionChanged //never emit since the web page do not have enough of space to scroll down.. //TODO : fix this poor solution QTimer::singleShot(1000, [=]() { if(state_ == state::parse_page_link){ parse_page_link(point); } }); } void bing_image_search::web_page_scroll_position_changed(const QPointF &point) { static size_t index = 0; qDebug()<<index++<<":"<<point.y(); scroll_web_page(point); }
java script of "js_scroll_to_window_height"
namespace{ QString doc_height() { return QString( "function doc_height(){" " return Math.max(" " document.body.scrollHeight, document.documentElement.scrollHeight," " document.body.offsetHeight, document.documentElement.offsetHeight," " document.body.clientHeight, document.documentElement.clientHeight);" "}" ); } } QString js_scroll_to_window_height(qreal offset) { return doc_height() + QString("\n" "var dheight = doc_height();" "function scrollPage(){" " var cur_height = window.innerHeight + window.pageYOffset;" " if(Math.abs(window.pageYOffset - document.body.scrollHeight) < %1){" " return [false, cur_height, dheight];" " }else{" " window.scrollTo(0, window.pageYOffset + %1);" " return [true, cur_height, dheight];" " }" "}" "scrollPage()").arg(offset); }
I got two problems
1 : I cannot find a better way to scroll down the web page without the help if timer(function "scroll_web_page"), do I have a better way to scroll page?
2 : I give the solution of stack overflow a shot(detect if browser window scroll to bottom), but none of them work as expected, I do some alternate on it, but this solution depend on luck a lots, sometime it can detect, sometime cannot.ps : scroll page would not emit loadFInished signal
-
One of the problem is, after I scroll the page, the height of the scroll bar may change, web view need times to reflect the change, if I scroll the page too fast, page scrolling action may end too early. No matter what I tried, I have to rely on timer and tune some parameters for specific search engine(google, bing, flickr), is this normal for web scraping, or I did something wrong(I hope I am wrong because I do not like to change parameters here and there)?Thanks