Important: Please read the Qt Code of Conduct -

QWebKit access to the final encoding guessed

  • I'm trying to get the encoding from a page loaded with a QWebPage component, it's hard to achieve it like the same way a WebKit's browser based (Safari or Chrome) does. Basicly I would like to get the encoding string displayed in "View -> Text Encoding" for Safari or "View -> Encoding" for Chrome.

    Tested several methods:

    • this->mainFrame()->evaluateJavaScript("window.document.characterSet"); // but I'm not always getting the same results as a standalone browser does
    • reading directly the metas with metaData() // but this is not always the real encoding
    • reading the QNetworkRequest::ContentTypeHeader from my own NetworkAccessManager class // but this is just the encoding from the server's response, not the final guessed by the browser (and isn't always present)
    • using QTextCodec and analyzing the string encoding, but the same, not the final encoding guessed by QWebPage

    Digging into the WebKit's source I found QtSources/4.7.4/src/3rdparty/webkit/WebCore/loader/TextResourceDecoder.cpp is it possible to use it in my own Qt application (how)? or I'm missing some api reference for getting the real QWebPage encoding?

  • You can retrieve the character encoding from the HTML meta tags of each page:

    • HTML 5
      @<meta charset="UTF-8">@
    • HTML 4
      @<meta http-equiv="content-type" content="text/html; charset=UTF-8">@

  • Did you read me Leon? (bullet 2 btw) You can't trust the encoding just reading the meta's. Certainly is a factor for guessing the encoding inside WebKit, but not the only one. I'm asking for the last codification chosen for WebKit.

  • Moderators

    Considering how hard it is to guess encodings correctly I would be surprised if webkit would really choose to ignore the HTML meta tags providing that information... but I am no expert there and can not provide more than a guess. Sorry.

Log in to reply