Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Pyside2 toHtml()?



  • def _loadFinished(self):
    self.page().toHtml(self.callable)

    def callable(self, data):
        self.html = data
        self.app.quit()
    

    I can find the fuction in PyQt5, but it not exist in Pyside2!

    How to get the source code of the loaded web page?

    I‘m tried use js code "var innerText = document.getElementsByTagName('html')[0].innerHTML", but how can I get innerText into python?



  • @v-n-lee
    So far as I can see, if you mean For the Python, let's start with:

    • QWebEnginePage::loadFinished(bool ok), https://doc.qt.io/qt-5/qwebenginepage.html#loadFinished, takes an argument stating whether the load was successful. Your slot should accept & check that argument.

    • Is your _loadFinished(self) indeed attached as a slot to the signal? Have you checked it is actually being called?

    • It probably won't make any difference, but perhaps you should decorate your slot with @Slot?

    EDIT: Hmm, I see what you mean now, https://doc.qt.io/qtforpython/PySide2/QtWebEngineWidgets/QWebEnginePage.html#qwebenginepage seems to have setHtml() and mentions toHtml(), but unlike the C++ docs does not show the latter has been supplied at all.... Did you try to see if it works, even though not documented? Similarly, they don't show the overloads of QWebEnginePage.runJavascript() which can get at JS results. Everything which has a QWebEngineCallback seems not to be implemented....

    This is sad news for me, as I am moving from PyQt5 to PySide2, and I hoped PySide2 would by now have all the methods. You could raise this issue at the Qt bug board for PySide2 and see what the PySide2 folks have to say? They may be very helpful, it looks like there must be a reason these callbacks have not been implemented.

    Bad news I'm afraid :( I found https://bugreports.qt.io/browse/PYSIDE-474?focusedCommentId=365367&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-365367, from July 2017, stating:

    I think we should just blacklist or skip the test.

    QWebEngineCallback is a poor man's std::function, and thus not supported by PySide yet.

    You should perhaps lobby for an update on this.... I have just made a post there at https://bugreports.qt.io/browse/PYSIDE-474?focusedCommentId=494236&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-494236, I don't know whether devs will look at new comments on old issues....



  • def show_web_code(self):
        t = '''var test=document.getElementsByTagName('html')[0].innerHTML;alert(test);'''
        self._tab_widget.currentWidget().page().runJavaScript(t)
    

    The result:
    alt text
    But I don't know how to transmit the code to python



  • @v-n-lee said in Pyside2 toHtml()?:

    But I don't know how to transmit the code to python

    That is because you cannot do so, until QWebEngineCallback is dealt with. Did you understand what I wrote and the Qt bug posts it references?



  • All right



  • @v-n-lee
    It looks like the current, open but unresolved, issue for this is https://bugreports.qt.io/browse/PYSIDE-946. I see no current workaround, though I have asked there if there is supposed to be one.



  • @v-n-lee and @JonB A possible workaround is to use QWebChannel

    import sys
    
    from PySide2 import QtCore, QtWidgets, QtWebEngineWidgets, QtWebChannel
    
    
    class Backend(QtCore.QObject):
        @QtCore.Slot(str)
        def toHtml(self, html):
            self._html = html
            QtCore.QCoreApplication.quit()
    
        @property
        def html(self):
            return self._html
    
    
    class WebEnginePage(QtWebEngineWidgets.QWebEnginePage):
        def __init__(self, url):
            app = QtWidgets.QApplication([])
            super(WebEnginePage, self).__init__()
            self.load(url)
            self.loadFinished.connect(self.onLoadFinished)
            self._backend = Backend()
            app.exec_()
    
        @property
        def backend(self):
            return self._backend
    
        @QtCore.Slot(bool)
        def onLoadFinished(self, ok):
            if ok:
                self.load_qwebchannel()
                self.load_object()
    
        def load_qwebchannel(self):
            file = QtCore.QFile(":/qtwebchannel/qwebchannel.js")
            if file.open(QtCore.QIODevice.ReadOnly):
                content = file.readAll()
                file.close()
                self.runJavaScript(content.data().decode())
            if self.webChannel() is None:
                channel = QtWebChannel.QWebChannel(self)
                self.setWebChannel(channel)
    
        def load_object(self):
            if self.webChannel() is not None:
                self.webChannel().registerObject("backend", self.backend)
                script = r"""
                new QWebChannel(qt.webChannelTransport, function (channel) {
                    var backend = channel.objects.backend;
                    var html = document.getElementsByTagName('html')[0].innerHTML;
                    backend.toHtml(html);
                });"""
                self.runJavaScript(script)
    
    
    if __name__ == "__main__":
        url = QtCore.QUrl("https://forum.qt.io/topic/110775/pyside2-tohtml")
        page = WebEnginePage(url)
        print(page.backend.html)
    


  • @eyllanesc
    Thank you for this. I assume it works! So QWebChannel is a component loadable from JS which can be used to communicate back to the Qt app host?



  • UPDATE: I see https://bugreports.qt.io/browse/PYSIDE-946 has just moved to In Progress, which is good news.



  • Do not try to save to text or download the page using js, this is a lost cause (for security reasons).
    One workaround: put the content of the page within the GET parameters of a form:

    document.getElementById('name_of_my_element').innerHTML = '<form action="#" method="get"><input type="hidden" name="param1" value="'+  encodeURIComponent(getResultsHTML()) +'">' +'<input type="submit" value="Submit"></form>';
    

    I made a js function getResultsHTML() that calculates results, but it can be any text, even a document.getElementById('name_of_my_element').innerHTML of you whole page

    In my python code it's simple:

    from PySide2.QtWebEngineWidgets import QWebEngineView
    from PySide2 import QtCore
    
    def run():
        # ...
        self.myview = QWebEngineView()
        path_html_file = os.path.abspath(path_html_file)
        url_tlx = QtCore.QUrl.fromLocalFile(path_html_file)
        self.myview .load(url_tlx)
        self.myview .show()  
        self.myview.urlChanged.connect(self.callback_url_changed)  # this will trigger when url is changed
    
    def callback_url_changed(self):
       print(myview.url().toString())
    # > file:///index.html?param1=text%20forwared%20in%20get#
    # continue with with re and urllib.parse to extract and decode from the string
    

    until a solution comes out...


Log in to reply