Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. Language Bindings
  4. [SOLVED] String Encoding problem between Python, XHTML, Javascript
Forum Update on Monday, May 27th 2025

[SOLVED] String Encoding problem between Python, XHTML, Javascript

Scheduled Pinned Locked Moved Language Bindings
4 Posts 1 Posters 2.8k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Offline
    C Offline
    cefn
    wrote on 24 Dec 2014, 18:36 last edited by
    #1

    I'm putting together a minimal WYSIWYG editor using QWebView, which simply uses the contentEditable support and a new XMLSerializer() to output the DOM as serialized XML. To save, the serialized is passed back from javascript to python for saving.

    I suspect that the translation between javascript strings and QString by the pyqtSlot(str) decorator is introducing an encoding problem and I don't know how to be stricter with encoding.

    Overall the strategy fundamentally works, but there must be some kind of encoding problem between setContent() to pass in the origin HTML and the serialized string I get from the Javascript 'save' callback.

    This has become obvious through, for example, the bad handling of some smart-quotes, which get turned into some kind of junk as if the unicode hasn't been processed properly (while the rest of the serialised XML string is fine).

    The steps are as follows...

    1. the HTML is loaded from a file, at this point smart quotes are properly encoded, (viewable in Chrome) and an example line looks like the following (imagine the smart quotes)...
      @‘step into the user’s shoes’ and ‘walk the user’s walk’
      @

    2. the HTML is loaded into a QWebView using [python] ...
      @self.view.setContent(buf.buffer(), "application/xhtml+xml", base_url)
      @
      ...and the example line still renders in the view apparently with smart quotes...
      @‘step into the user’s shoes’ and ‘walk the user’s walk’
      @

    3. the HTML DOM is serialised using [javascript]
      @s = (new XMLSerializer()).serializeToString(document.documentElement)
      @
      ...and the example line (by debugging and reading the string) still renders in the Javascript console with smart quotes..
      @‘step into the user’s shoes’ and ‘walk the user’s walk’
      @

    4. the Serialized string is passed back as an argument to a pyqtSlot-decorated object with a save(...) method which was previously exposed to the QWebView page().mainFrame() using addToJavascriptWindowObject(...)
      @editor.save(str)
      @

    5. The QString arriving at the Editor object's save method is then saved to a file, where the previously mentioned extract now looks like...
      @?step into the user?s shoes? and ?walk the user?s walk?
      @

    What should I be doing to correctly handle passing in the string from the javascript side, or receiving it on the python side, to avoid this encoding issue?

    The full code I'm using is available at https://github.com/cefn/firmware-codesign-readinglog/tree/master/ui

    1 Reply Last reply
    0
    • C Offline
      C Offline
      cefn
      wrote on 26 Dec 2014, 11:02 last edited by
      #2

      I've just created a much simpler test case which recreates the problem. https://github.com/cefn/firmware-codesign-readinglog/tree/master/ui/test

      Perhaps someone with knowledge of throwing around string encodings in Qt can have a look at the Editor#save() method and figure out what needs to be done so that saved_test.html and test.html are identical after running test.py .

      Currently the loaded file and saved file are identical except the Smart Quotes which are badly encoded for some unknown reason. Promising, but the encoding problem is a show-stopper :(

      If these two files can be made identical, then it should be possible to throw together a WYSIWYG editor in QWebView in just a few lines of Javascript, using HTML's contentEditable support.

      1 Reply Last reply
      0
      • C Offline
        C Offline
        cefn
        wrote on 26 Dec 2014, 20:55 last edited by
        #3

        OK, so I found a hack which does the job. It involves iterating over every character in the javascript string and storing the Unicode codepoint in an array as a javascript number.

        This array is passed to python as a QVariantList, which appears in python as a list of floats, which can then be wrangled through int() unichr(), join() and encode() to an ascii string suitable to be written to file. Nasty as hell, but it works.

        JAVASCRIPT SIDE

        @ function getChars(s) {
        var chars = [];
        for (var i = 0; i < s.length; i++) {
        chars.push(s.charCodeAt(i));
        }
        return chars;
        };

              var ser = new XMLSerializer();
              var mystr = ser.serializeToString(document.documentElement);
              editor.save(getChars(mystr));
        

        @

        PYTHON SIDE

        @ @pyqtSlot("QVariantList")
        def save(self, serialized):

        # come in as floats from javascript
        domchars = [unichr(int(entry)) for entry in serialized]
        domunicode = ''.join(domchars)
        domascii = domunicode.encode("UTF-8")
        
        f = open("saved_" + filepath, 'w')
        f.write(domascii)
        f.close()
        

        @

        The original problematic version, which indicates the problem from getting unicode strings out of QWebView can be seen at https://github.com/cefn/firmware-codesign-readinglog/blob/7c25475ba27f565403b64aafc364012437d85a1e/ui/test/test.py

        ...and the fixed up version which loads and saves UTF-8 XHTML without change is at...
        https://github.com/cefn/firmware-codesign-readinglog/blob/4b70f47db95bd2dabf33dae1ec747eaf0664b28d/ui/test/test.py

        1 Reply Last reply
        0
        • C Offline
          C Offline
          cefn
          wrote on 26 Dec 2014, 21:38 last edited by
          #4

          Now even better. I've found that calling toUtf8() turns the unicode array implicit in the QString passed by the original @pyqtSlot decorator into something which can be written to file without messing about, and which preserves special characters. I've no idea why no combination of python str() and bytearray() encode and decode operations could seem to achieve this, but it's done now...
          https://github.com/cefn/firmware-codesign-readinglog/blob/8c315b85c14f83539313bace54453565ba8aa9f6/ui/test/test.py

          1 Reply Last reply
          0

          1/4

          24 Dec 2014, 18:36

          • Login

          • Login or register to search.
          1 out of 4
          • First post
            1/4
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • Users
          • Groups
          • Search
          • Get Qt Extensions
          • Unsolved