QWebKit access to the final encoding guessed
-
I'm trying to get the encoding from a page loaded with a QWebPage component, it's hard to achieve it like the same way a WebKit's browser based (Safari or Chrome) does. Basicly I would like to get the encoding string displayed in "View -> Text Encoding" for Safari or "View -> Encoding" for Chrome.
Tested several methods:
- this->mainFrame()->evaluateJavaScript("window.document.characterSet"); // but I'm not always getting the same results as a standalone browser does
- reading directly the metas with metaData() // but this is not always the real encoding
- reading the QNetworkRequest::ContentTypeHeader from my own NetworkAccessManager class // but this is just the encoding from the server's response, not the final guessed by the browser (and isn't always present)
- using QTextCodec and analyzing the string encoding, but the same, not the final encoding guessed by QWebPage
Digging into the WebKit's source I found QtSources/4.7.4/src/3rdparty/webkit/WebCore/loader/TextResourceDecoder.cpp is it possible to use it in my own Qt application (how)? or I'm missing some api reference for getting the real QWebPage encoding?
-
You can retrieve the character encoding from the HTML meta tags of each page:
- HTML 5
@<meta charset="UTF-8">@ - HTML 4
@<meta http-equiv="content-type" content="text/html; charset=UTF-8">@
- HTML 5
-
Did you read me Leon? (bullet 2 btw) You can't trust the encoding just reading the meta's. Certainly is a factor for guessing the encoding inside WebKit, but not the only one. I'm asking for the last codification chosen for WebKit.
-
Considering how hard it is to guess encodings correctly I would be surprised if webkit would really choose to ignore the HTML meta tags providing that information... but I am no expert there and can not provide more than a guess. Sorry.