wrap QApplication::exec() into a function or class



  • I want the following code to get the request url starting with http://down.51en.com:88 during the web loading process , and then do other processing with the response object of the url .
    In my program, once targetUrl is assigned a value , I want the function targetUrlGetter(url) to return it to the caller, however , the problem is that QApplication::exec() enters the main event loop so cannot execute code at the end of thetargetUrlGetter() function after the exec() call , thus the function cannot return , I have tried with qApp.quit() in interceptRequest(self, info) in order to tell the application to exit so that targetUrlGetter(url) can return , but the function still cannot return and the program even crashes on exit, so how can I return the targetUrl to the caller program ?

    BTW, I am going to use the code at (Django) web server side.

    import sys
    
    from PyQt5.QtWidgets import *
    from PyQt5.QtWebEngineWidgets import *
    from PyQt5.QtWebEngineCore import *
    from PyQt5.QtCore import *
    
    
    class WebEngineUrlRequestInterceptor(QWebEngineUrlRequestInterceptor):
        def __init__(self, parent=None):
            super().__init__(parent)
            self.page = parent
    
        def interceptRequest(self, info):
            if info.requestUrl().toString().startswith('http://down.51en.com:88'):
                self.targetUrl = info.requestUrl().toString()
                print('----------------------------------------------', self.targetUrl)
                qApp.quit()
    
                # self.page.load(QUrl(''))
    
    
    def targetUrlGetter(url=None):
        app = QApplication(sys.argv)
        page = QWebEnginePage()
        globalSettings = page.settings().globalSettings()
        globalSettings.setAttribute(
            QWebEngineSettings.PluginsEnabled, True)
        globalSettings.setAttribute(
            QWebEngineSettings.AutoLoadImages, False)
        profile = page.profile()
        webEngineUrlRequestInterceptor = WebEngineUrlRequestInterceptor(page)
        profile.setRequestInterceptor(webEngineUrlRequestInterceptor)
        page.load(QUrl(url))
        # view = QWebEngineView()
        # view.setPage(page)
        # view.show()
        app.exec_()
        return webEngineUrlRequestInterceptor.targetUrl
    
    
    url = "http://www.51en.com/news/sci/everything-there-is-20160513.html"
    # url = "http://www.51en.com/news/sci/obese-dad-s-sperm-may-influence-offsprin.html"
    # url = "http://www.51en.com/news/sci/mars-surface-glass-could-hold-ancient-fo.html"
    targetUrl = targetUrlGetter(url)
    print(targetUrl)

  • Moderators



  • @redstoneleo said:

    once targetUrl is assigned a value

    I cannot see the difference On doing that


  • Moderators

    This post is deleted!

  • Moderators

    @redstoneleo You should not put app and app.exec() into a function. And instead of using the return value of a function you should use signals/slots to get the value (you can emit a signal in interceptRequest()).



  • @jsulm Please see the updated post again ! thanks !

    The difficulties here are how to exit the Qt event loop without crash and return the request url to the caller , I cannot see the signals/slots can help here



  • You have a number of issues, running exec_ from a function actually isn't one of them (in this case!). I have a number of questions/points:

    • Why are you doing this with Qt? If you are going to be running this on a server (with Django?) then it will probably be headless and a Q(Gui)Application won't run there. You should probably look at a pure python solution using urllib / HTTPRedirectHandler.
    • Your program will only hit qApp.quit() IF it is redirected to your specified URL; what if it doesn't?
    • You should definitely be doing this using the signal/slot mechanism! You are currently trying to work against Qt's asynchronicity.

    The following is a working version of what you're trying to do (Python 3.5.1, PyQt5.7):

    from sys import argv
    
    from PyQt5.QtCore import pyqtSignal, pyqtSlot, pyqtProperty, QObject, QUrl
    from PyQt5.QtWidgets import QApplication, qApp
    from PyQt5.QtWebEngineCore import QWebEngineUrlRequestInterceptor
    from PyQt5.QtWebEngineWidgets import QWebEnginePage, QWebEngineView, QWebEngineSettings
    
    class Interceptor(QWebEngineUrlRequestInterceptor):
        urlIntercepted=pyqtSignal(str)
    
        def __init__(self, url, parent=None, **kwargs):
            super().__init__(parent, **kwargs)
    
            self._url=url
            self._page=parent
    
        def interceptRequest(self, info):
            if info.requestUrl().toString().startswith(self._url):
                self.urlIntercepted.emit(info.requestUrl().toString())
    
    class TargetUrlGetter(QObject):
        def __init__(self, url, interceptUrl, show=False, parent=None, **kwargs):
            super().__init__(parent, **kwargs)
    
            self._targetUrl=None
            self._url=url
            self._interceptUrl=interceptUrl
            self._page=QWebEnginePage(loadFinished=qApp.quit)
    
            settings=self._page.settings().globalSettings()
            settings.setAttribute(QWebEngineSettings.PluginsEnabled, True)
            settings.setAttribute(QWebEngineSettings.AutoLoadImages, False)
    
            self._interceptor=Interceptor(
                self._interceptUrl,
                self._page,
                urlIntercepted=self.urlIntercepted
            )
            profile=self._page.profile()
            profile.setRequestInterceptor(self._interceptor)
    
            self._page.load(QUrl(url))
    
            if not show: return
    
            self._view=QWebEngineView()
            self._view.setPage(self._page)
            self._view.show()
    
        def getTargetUrl(self): return self._targetUrl
    
        @pyqtSlot(str)
        def setTargetUrl(self, url):
            if url==self._targetUrl: return
            self._targetUrl=url
    
        targetUrl=pyqtProperty(str, getTargetUrl, setTargetUrl)
    
        @pyqtSlot(str)
        def urlIntercepted(self, url):
            self.targetUrl=url
            qApp.quit()
    
    def getTargetUrl(url, interceptUrl):
        app=QApplication(argv)
        t=TargetUrlGetter(url, interceptUrl)
        exitValue=app.exec_()
        return t.targetUrl
    
    if __name__=="__main__":
        url="http://www.51en.com/news/sci/everything-there-is-20160513.html"
        interceptUrl='http://down.51en.com:88'    
        print(getTargetUrl(url, interceptUrl))
    

  • Moderators

    @jazzycamel It is perfectly fine to use Qt for non GUI applications. See QtCore, QtNetwork and some other modules.



  • @jsulm I agree, but as I said this needs a QApplication because its using QWebEnginePage/View... that will crash if you try and start it in a headless (i.e. typical web server) environment. Also, doing this with Qt rather than a python builtin module really is using a sledgehammer to crack a nut.



  • @jazzycamel

    1.Once I changed

    url = "http://www.le.com/ptv/vplay/1417484.html"
    interceptUrl = 'http://api.le.com/mms/out/video/playJson?'
    

    in your code, then the program crashes during running , tested on Win7 32bit. What I am sure is this time the flash player was used by QtWebEngine, while in your original code, flash player was not used.

    1. During Loading this web page , the browser makes many requests like this
      http://i.stack.imgur.com/shSOu.png
      now I need a certain request url (e.g.starts with 'http://api.le.com/mms/out/video/playJson?') during the Loading process ,the data needed to replicate the request url has been encrypted , so I resort for help to some browser like tools to get the url directly rather than figure out the encryption algorithm of the data within the url(the latter way is often more difficult or nearly impossible), while QtWebEngine provides the necessary tools, this is reason why I choose to use Qt in my project, and if I get the url this way, then I plane to return the url to the caller to get the response data of the url.
      The main reason why I want to put this piece of code at server side is that I found it is error prone when packaging PyQt5 program involved with QtWebEngine into executables, so I turn to think about putting the code involved with QtWebEngine at the server side to get rid of the packaging problem .

    2. If my program doesn’t found the target url until QWebEnginePage loadFinished,I think it is ok just to return None to the caller .
      As for the program design ,I want to make the object created during the first time running being reusable as much as possible for easing the creation time consuming. I think using the signal/slot mechanism could help achieve this object-reusable goal.



  • @redstoneleo

    1. I'm using a Macbook Pro (OSX 10.11.4, El Capitan). It doesn't crash on mine with your new URLs, but it doesn't find the intercept URL either... I'm not sure how/if QtWebEngine handles Flash or if the implementation/plugin used is platform specific, sorry.

    2. The complexity of your use case was not evident from the example you gave, so maybe a simple python builtin solution won't be enough.

    3. a) My point was not that returning None if your URL isn't found was a problem (that's what my example does after all), it was that in your original example the program would never have returned as the only place the event loop was stopped (qApp.quit()) was in the conditional block handling the intercept URL.
      b) The problem of keeping the objects alive is that once QApplication.exec_() is called, it blocks, and the only way to release it is to quit it. You need to call exec_() for the event loop to run. The obvious answer would normally be to run this in a thread, but you can't create GUI/Widgets objects outside of the main thread...

    As a general point, you might do better looking at some of the web scraping/crawling libraries out there (Scrapy for example).



  • @jazzycamel

    Thanks for reply!

    1 . Sorry , I’ve forgot to tell you that you should have Flash Player installed in

    /Library/Internet Plug-Ins/PepperFlashPlayer/PepperFlashPlayer.plugin
    

    On OS X according to the doc before you test the program with my new URLs.

    2 . If the URL I want wasn't found , then I agree with your solution

    QWebEnginePage(loadFinished=qApp.quit)
    

    to quit the event loop so the function can return .

    1. I am also considering using Chrome extension to solve the problem .

Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.