Strings and encodings in Qt
-
From your favourite author, a brand new -- work in progress -- wiki article:
http://developer.qt.nokia.com/wiki/Strings_and_encodings_in_Qt
Any feedback is welcome.
-
First of all, thanks for putting the effort into writing this wiki page.
What I miss in the story is the alternative approach to avoid non-ascii characters in the source text completely. Yes, I know there are differences of opinion if that is the best way to do it or not, but it is a valid approach. I just use ascii in my sources, and use Qt Linguist to support any unicode I like. It avoids creating a dependency in the code (the ::fromUtf8 call) to the encoding of of the code (the fact that I must make sure that I, and everybody else in the company) always edit the source file using an editor that uses the UTF8 encoding.
-
Andre is right. That approach could get a more elaborate paragraph in the beginnen.
I'm a strong advocate for using UTF-8 in your source code. And while this still holds true, I recently made up my mind and would go with the plain text + linguist solution in the future (or UTF-8 + linguist, it depends). That's not caused by UTF-8 hassles, but by the pure fact that the tr() methods just provide more functionality than bridging the encoding gap. Most notably is the support for singular/plural forms that you get for free (without having to hardcode this in your source).
One caveat using the tr() singular/plural magic: You must create a translation for the language used in the sources too!
-
[quote author="Andre" date="1323244054"]
What I miss in the story is the alternative approach to avoid non-ascii characters in the source text completely. Yes, I know there are differences of opinion if that is the best way to do it or not, but it is a valid approach. I just use ascii in my sources, and use Qt Linguist to support any unicode I like. [/quote]Sounds good. Could you elaborate a bit about your approach? Do you have some (more or less) formal guidelines you follow, that I can add to the wiki page? For instance, what do you put in your code exactly if you have to write "Saint Honoré", both in the cases of a user-visible string or not?
-
That would depend on the language you use as your base, I guess. I think that the approach would only work if the language you want to use for your strings doesn't have or need too many non-ascii characters to produce something readable. I know for instance German has rules about replacing "Güt" with "Guet" and the Ringel-S (sorry, no idea how to type that one on my keyboard here) with a double-s. How that would work on "Saint Honoré", I don't know, I don't often run into strings like that in my applications. I guess I would just leave off the accent from the é. I know I (have to) do that for my name all the time anyway...
What I do, is that I keep all strings in my sources in English. English doesn't have non-ascii characters, so normally you would not run into these. Then, I make translation files
- to English (yes, translate English to English to get plurals working)
- other required language(s) if needed.
When I need non-ascii characters in my source, for instance when I want to use symbols from the unicode table, I usually use the unicode code directly. That is, instead of writing QChar('∞'), I would use QChar(0x221E); //infinity symbol. That happens only sporadically though, little enough to make that workable.