Strings and encodings in Qt

dangelog

From your favourite author, a brand new -- work in progress -- wiki article:

http://developer.qt.nokia.com/wiki/Strings_and_encodings_in_Qt

Any feedback is welcome.

qxoz

Very nice!
I don't see there setting encoding for translation
@QTextCodec::setCodecForTr(QTextCodec::codecForName ("Windows-1251"));@
or that is another topic?

dangelog

I18n is actually a broader topic -- I briefly mentioned QTextCodec::codecForTr(), CODECFORTR, etc.; do you think it's worth elaborating?

qxoz

Yeah you right
maybe it's not worth

andre

First of all, thanks for putting the effort into writing this wiki page.

What I miss in the story is the alternative approach to avoid non-ascii characters in the source text completely. Yes, I know there are differences of opinion if that is the best way to do it or not, but it is a valid approach. I just use ascii in my sources, and use Qt Linguist to support any unicode I like. It avoids creating a dependency in the code (the ::fromUtf8 call) to the encoding of of the code (the fact that I must make sure that I, and everybody else in the company) always edit the source file using an editor that uses the UTF8 encoding.

goetz

Andre is right. That approach could get a more elaborate paragraph in the beginnen.

I'm a strong advocate for using UTF-8 in your source code. And while this still holds true, I recently made up my mind and would go with the plain text + linguist solution in the future (or UTF-8 + linguist, it depends). That's not caused by UTF-8 hassles, but by the pure fact that the tr() methods just provide more functionality than bridging the encoding gap. Most notably is the support for singular/plural forms that you get for free (without having to hardcode this in your source).

One caveat using the tr() singular/plural magic: You must create a translation for the language used in the sources too!

dangelog

[quote author="Andre" date="1323244054"]
What I miss in the story is the alternative approach to avoid non-ascii characters in the source text completely. Yes, I know there are differences of opinion if that is the best way to do it or not, but it is a valid approach. I just use ascii in my sources, and use Qt Linguist to support any unicode I like. [/quote]

Sounds good. Could you elaborate a bit about your approach? Do you have some (more or less) formal guidelines you follow, that I can add to the wiki page? For instance, what do you put in your code exactly if you have to write "Saint Honoré", both in the cases of a user-visible string or not?

andre

That would depend on the language you use as your base, I guess. I think that the approach would only work if the language you want to use for your strings doesn't have or need too many non-ascii characters to produce something readable. I know for instance German has rules about replacing "Güt" with "Guet" and the Ringel-S (sorry, no idea how to type that one on my keyboard here) with a double-s. How that would work on "Saint Honoré", I don't know, I don't often run into strings like that in my applications. I guess I would just leave off the accent from the é. I know I (have to) do that for my name all the time anyway...

What I do, is that I keep all strings in my sources in English. English doesn't have non-ascii characters, so normally you would not run into these. Then, I make translation files

to English (yes, translate English to English to get plurals working)
other required language(s) if needed.

When I need non-ascii characters in my source, for instance when I want to use symbols from the unicode table, I usually use the unicode code directly. That is, instead of writing QChar('∞'), I would use QChar(0x221E); //infinity symbol. That happens only sporadically though, little enough to make that workable.