wrap QApplication::exec() into a function or class
I want the following code to get the request url starting with
http://down.51en.com:88during the web loading process , and then do other processing with the response object of the url .
In my program, once
targetUrlis assigned a value , I want the function
targetUrlGetter(url)to return it to the caller, however , the problem is that
QApplication::exec()enters the main event loop so cannot execute code at the end of the
targetUrlGetter()function after the exec() call , thus the function cannot return , I have tried with
interceptRequest(self, info)in order to tell the application to exit so that
targetUrlGetter(url)can return , but the function still cannot return and the program even crashes on exit, so how can I return the
targetUrlto the caller program ?
BTW, I am going to use the code at (Django) web server side.
import sys from PyQt5.QtWidgets import * from PyQt5.QtWebEngineWidgets import * from PyQt5.QtWebEngineCore import * from PyQt5.QtCore import * class WebEngineUrlRequestInterceptor(QWebEngineUrlRequestInterceptor): def __init__(self, parent=None): super().__init__(parent) self.page = parent def interceptRequest(self, info): if info.requestUrl().toString().startswith('http://down.51en.com:88'): self.targetUrl = info.requestUrl().toString() print('----------------------------------------------', self.targetUrl) qApp.quit() # self.page.load(QUrl('')) def targetUrlGetter(url=None): app = QApplication(sys.argv) page = QWebEnginePage() globalSettings = page.settings().globalSettings() globalSettings.setAttribute( QWebEngineSettings.PluginsEnabled, True) globalSettings.setAttribute( QWebEngineSettings.AutoLoadImages, False) profile = page.profile() webEngineUrlRequestInterceptor = WebEngineUrlRequestInterceptor(page) profile.setRequestInterceptor(webEngineUrlRequestInterceptor) page.load(QUrl(url)) # view = QWebEngineView() # view.setPage(page) # view.show() app.exec_() return webEngineUrlRequestInterceptor.targetUrl url = "http://www.51en.com/news/sci/everything-there-is-20160513.html" # url = "http://www.51en.com/news/sci/obese-dad-s-sperm-may-influence-offsprin.html" # url = "http://www.51en.com/news/sci/mars-surface-glass-could-hold-ancient-fo.html" targetUrl = targetUrlGetter(url) print(targetUrl)
@redstoneleo Why not connect http://doc.qt.io/qt-5/qwebenginepage.html#loadFinished signal to quit() slot?
once targetUrl is assigned a value
I cannot see the difference On doing that
This post is deleted!
@redstoneleo You should not put app and app.exec() into a function. And instead of using the return value of a function you should use signals/slots to get the value (you can emit a signal in interceptRequest()).
@jsulm Please see the updated post again ! thanks !
The difficulties here are how to exit the Qt event loop without crash and return the request url to the caller , I cannot see the signals/slots can help here
You have a number of issues, running exec_ from a function actually isn't one of them (in this case!). I have a number of questions/points:
- Why are you doing this with Qt? If you are going to be running this on a server (with Django?) then it will probably be headless and a Q(Gui)Application won't run there. You should probably look at a pure python solution using urllib / HTTPRedirectHandler.
- Your program will only hit qApp.quit() IF it is redirected to your specified URL; what if it doesn't?
- You should definitely be doing this using the signal/slot mechanism! You are currently trying to work against Qt's asynchronicity.
The following is a working version of what you're trying to do (Python 3.5.1, PyQt5.7):
from sys import argv from PyQt5.QtCore import pyqtSignal, pyqtSlot, pyqtProperty, QObject, QUrl from PyQt5.QtWidgets import QApplication, qApp from PyQt5.QtWebEngineCore import QWebEngineUrlRequestInterceptor from PyQt5.QtWebEngineWidgets import QWebEnginePage, QWebEngineView, QWebEngineSettings class Interceptor(QWebEngineUrlRequestInterceptor): urlIntercepted=pyqtSignal(str) def __init__(self, url, parent=None, **kwargs): super().__init__(parent, **kwargs) self._url=url self._page=parent def interceptRequest(self, info): if info.requestUrl().toString().startswith(self._url): self.urlIntercepted.emit(info.requestUrl().toString()) class TargetUrlGetter(QObject): def __init__(self, url, interceptUrl, show=False, parent=None, **kwargs): super().__init__(parent, **kwargs) self._targetUrl=None self._url=url self._interceptUrl=interceptUrl self._page=QWebEnginePage(loadFinished=qApp.quit) settings=self._page.settings().globalSettings() settings.setAttribute(QWebEngineSettings.PluginsEnabled, True) settings.setAttribute(QWebEngineSettings.AutoLoadImages, False) self._interceptor=Interceptor( self._interceptUrl, self._page, urlIntercepted=self.urlIntercepted ) profile=self._page.profile() profile.setRequestInterceptor(self._interceptor) self._page.load(QUrl(url)) if not show: return self._view=QWebEngineView() self._view.setPage(self._page) self._view.show() def getTargetUrl(self): return self._targetUrl @pyqtSlot(str) def setTargetUrl(self, url): if url==self._targetUrl: return self._targetUrl=url targetUrl=pyqtProperty(str, getTargetUrl, setTargetUrl) @pyqtSlot(str) def urlIntercepted(self, url): self.targetUrl=url qApp.quit() def getTargetUrl(url, interceptUrl): app=QApplication(argv) t=TargetUrlGetter(url, interceptUrl) exitValue=app.exec_() return t.targetUrl if __name__=="__main__": url="http://www.51en.com/news/sci/everything-there-is-20160513.html" interceptUrl='http://down.51en.com:88' print(getTargetUrl(url, interceptUrl))
@jazzycamel It is perfectly fine to use Qt for non GUI applications. See QtCore, QtNetwork and some other modules.
@jsulm I agree, but as I said this needs a QApplication because its using QWebEnginePage/View... that will crash if you try and start it in a headless (i.e. typical web server) environment. Also, doing this with Qt rather than a python builtin module really is using a sledgehammer to crack a nut.
1.Once I changed
url = "http://www.le.com/ptv/vplay/1417484.html" interceptUrl = 'http://api.le.com/mms/out/video/playJson?'
in your code, then the program crashes during running , tested on Win7 32bit. What I am sure is this time the flash player was used by QtWebEngine, while in your original code, flash player was not used.
During Loading this web page , the browser makes many requests like this
now I need a certain request url (e.g.starts with 'http://api.le.com/mms/out/video/playJson?') during the Loading process ,the data needed to replicate the request url has been encrypted , so I resort for help to some browser like tools to get the url directly rather than figure out the encryption algorithm of the data within the url(the latter way is often more difficult or nearly impossible), while QtWebEngine provides the necessary tools, this is reason why I choose to use Qt in my project, and if I get the url this way, then I plane to return the url to the caller to get the response data of the url.
The main reason why I want to put this piece of code at server side is that I found it is error prone when packaging PyQt5 program involved with QtWebEngine into executables, so I turn to think about putting the code involved with QtWebEngine at the server side to get rid of the packaging problem .
If my program doesn’t found the target url until QWebEnginePage loadFinished,I think it is ok just to return None to the caller .
As for the program design ,I want to make the object created during the first time running being reusable as much as possible for easing the creation time consuming. I think using the signal/slot mechanism could help achieve this object-reusable goal.
I'm using a Macbook Pro (OSX 10.11.4, El Capitan). It doesn't crash on mine with your new URLs, but it doesn't find the intercept URL either... I'm not sure how/if QtWebEngine handles Flash or if the implementation/plugin used is platform specific, sorry.
The complexity of your use case was not evident from the example you gave, so maybe a simple python builtin solution won't be enough.
a) My point was not that returning
Noneif your URL isn't found was a problem (that's what my example does after all), it was that in your original example the program would never have returned as the only place the event loop was stopped (
qApp.quit()) was in the conditional block handling the intercept URL.
b) The problem of keeping the objects alive is that once
QApplication.exec_()is called, it blocks, and the only way to release it is to quit it. You need to call
exec_()for the event loop to run. The obvious answer would normally be to run this in a thread, but you can't create GUI/Widgets objects outside of the main thread...
As a general point, you might do better looking at some of the web scraping/crawling libraries out there (Scrapy for example).
Thanks for reply!
1 . Sorry , I’ve forgot to tell you that you should have Flash Player installed in
On OS X according to the doc before you test the program with my new URLs.
2 . If the URL I want wasn't found , then I agree with your solution
to quit the event loop so the function can return .
- I am also considering using Chrome extension to solve the problem .