How can I Create and Open Text File Encode(utf8) on mac osx

tony

Hi,

the problem is not related to QTextStream, but "how" you pass string to it in your source code. So, if you want to be sure that everything is fine, you should save your source code in UTF-8 too. Otherwise, you can "pre-encode" your text in UTF-8 and type it in your source code as a byte sequence, i.e.

"\023\045\042..."

So you can save your source code as you like, cause it will contains just ASCII characters.

Tony.

Polto

Thanks but i change it now from creator - preference but nothing
the same thing

goetz

In your code you pass a const char* to the text stream.

bq. From "QTextStream::operator<<()":http://doc.trolltech.com/4.7/qtextstream.html#operator-lt-lt-16 doc:
Writes the constant string pointed to by string to the stream. string is assumed to be in ISO-8859-1 encoding. This operator is convenient when working with constant string data

You have to construct a QString before you pass it to the text stream. This snippet works for me:

@
QTextStream out(&File);
out.setCodec("UTF-8");
QString x = QString::fromUtf8("مرحبا \n");
out << x;
@

Frank

[quote author="Volker" date="1288952811"]
QTextStream out(&File);
out.setCodec("UTF-8");
QString x = QString::fromUtf8("مرحبا \n");
out << x;
@
[/quote]

That requires the source file to be UTF8 though. Which causes issues with MSVC, iirc. I find non-ascii source files a PITA when targeting multiple platforms.

goetz

We've changed all of our C++ sources to UTF-8 a while ago, no problems on Mac and MS Visual Studio. Of course we do use "true" UTF-8 strings only in character constants (const char*).

We use

@
CODECFORTR = UTF-8
CODECFORSRC = UTF-8
@

in our .pro files.

tony

I agree with Frank.

This is the main reason why I prefer to "pre-encode" UTF-8 constant string and type in the code the escaped sequence of bytes. Mostly cause it's rare for me to have such strings (if you write everything in English and you translate later with Qt Linguist, you don't even need to care for that).

T.

goetz

If you have to write mostly 7 bit ASCII strings, then that is perfectly ok.

But I strictly refuse to code my German Umlauts (äöü) or french accented chars (é) in an ancient, stone age style. Come on - it's year 2010, we've landed on mars and we still should not enter more than 7bit chars in modern code? We can use our time better than looking up code points in tables :-) Not to mention readability of the source code!

tony

You're right, Volker.

That's why I wrote a small Qt app with two QTextEdit, that codes strings for me. Just for coding one string, it's a copy-paste effort, that even in 2010 maybe it's worth trying :) :) .

If I need to write a sw in a different language than English, I prefer the Qt Linguist approach. So I'll have for free at least two languages :)

T.

goetz

An I just hit the 'ä'-key on my keyboard.

I really had no problem with UTF-8 source both on Windows and on Mac. Maybe we are lucky, because we're only using Qt (QString) for the string handling :-)

We do a big commercial project. Coding in English in the first place and translating afterwards is not an option, as our primary target are German customers.

Polto

I think it is not about the data wich we write to file
the problem is ok and the character arabic is stored good ( just if i create a file by normal text editor and save it as utf-8 encode)
then when i open it from qt and pass characters to file it was ok

goetz

The data in your version is a const char * string. It's encoding depends on the settings of your editor. According to the docs the const char * is written out assuming ISO-8859-1 charset. If your source file is in UTF-8 (or any other encoding), your 2, 3 or 4 bytes that make up one single character in your string are considered a single latin-1 character and output as the according utf-8 characters.

If you have any other encoding than ISO-8859-1 in your const char * string constants, you must convert them to a QString with the appropriate conversion routines and/or text codecs.

Your

@
out << "مرحبا \n";
@

is internally converted in to a

@
d->putString(QLatin1String(string));
@

The output definitely cannot be what you expect...

windwaker

Hello,

I did some playing arround with it and managed to find the missing link:
It seems that the file is initialized improperly. When you use @out.setCodec("UTF-8");@ it only sets the codec for the text (which means that after writing to the file the data in the file is encoded correctly), but the file is still not initialized properly.
So I have added the BOM information manually. Check out the below code:
@#define UTF8BOM "\xEF\xBB\xBF"
#define UTF8 "UTF-8"
#define UTF16LEBOM "\xFF\xFE"
#define UTF16LE "UTF-16LE"
#define UTF16BEBOM "\xFE\xFF"
#define UTF16BE "UTF-16BE"

QFile f( "test.txt" );
// Remove the file if it exists
if( f.exists() ) f.remove(f.fileName());
// Open the file in binary mode or exit if it fails
if( !f.open(f.ReadWrite) ) return -2;
f.write(UTF8BOM); // Write UTF-8 BOM to the start of the file
f.setTextModeEnabled(true); // Enable text mode
QTextStream out (&f);
out.setCodec(UTF8);
QString x = QString::fromUtf8("مرحبا \n");
out << x;
f.close();
return 0;@ However your source files still need to be encoded in UTF-8. If you want to save your files in different encoding (e.g. UTF-16) you can use other defined pairs. I have tested them and they also work.

Only posted this in case anyone else is searching for a quick solution to this problem.

Cheers!

goetz

Byte order marks are not required for valid UTF-8 files, and actually they are often discouraged to use. It's mainly a Windows "interpretation" of UTF-8, most unixoid OSes do not use them either. See wikipedias entry on "Byte Order Marks":http://en.wikipedia.org/wiki/Byte_Order_Mark for a quick overview.

windwaker

We should remember, that windows is not a normal OS, so I guess this is a workarround for windows then...

Any other piece of code that actually works would be highly appreciated :)