Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How can I Create and Open Text File Encode(utf8) on mac osx

How can I Create and Open Text File Encode(utf8) on mac osx

Scheduled Pinned Locked Moved General and Desktop
15 Posts 5 Posters 17.4k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P Offline
    P Offline
    Polto
    wrote on 5 Nov 2010, 09:34 last edited by
    #1

    I tried to write an "arabic word" to a text file
    @#include <QtCore>

    int main()
    {
    QFile File("test.txt");
    File.open(QIODevice::ReadWrite | QIODevice::Text);

    QTextStream out(&File);
    out.setCodec("UTF-8");
    out << "مرحبا \n";
    
    File.close();
    

    }
    @
    but the output is strange chars.
    that the test.txt content :
    مرحبا

    but when i save the file from the "text editor" as a utf-8 file encode
    the result is test.txt:
    مرحبا
    so how can i tell the QFile To encode the file to UTF-8
    I see in other forum the Q3 TextStrem it can do it.

    Polto

    1 Reply Last reply
    0
    • T Offline
      T Offline
      tony
      wrote on 5 Nov 2010, 09:37 last edited by
      #2

      Hi,

      the problem is not related to QTextStream, but "how" you pass string to it in your source code. So, if you want to be sure that everything is fine, you should save your source code in UTF-8 too. Otherwise, you can "pre-encode" your text in UTF-8 and type it in your source code as a byte sequence, i.e.

      "\023\045\042..."

      So you can save your source code as you like, cause it will contains just ASCII characters.

      Tony.

      1 Reply Last reply
      0
      • P Offline
        P Offline
        Polto
        wrote on 5 Nov 2010, 09:42 last edited by
        #3

        Thanks but i change it now from creator - preference but nothing
        the same thing

        Polto

        1 Reply Last reply
        0
        • G Offline
          G Offline
          goetz
          wrote on 5 Nov 2010, 10:26 last edited by
          #4

          In your code you pass a const char* to the text stream.

          bq. From "QTextStream::operator<<()":http://doc.trolltech.com/4.7/qtextstream.html#operator-lt-lt-16 doc:
          Writes the constant string pointed to by string to the stream. string is assumed to be in ISO-8859-1 encoding. This operator is convenient when working with constant string data

          You have to construct a QString before you pass it to the text stream. This snippet works for me:

          @
          QTextStream out(&File);
          out.setCodec("UTF-8");
          QString x = QString::fromUtf8("مرحبا \n");
          out << x;
          @

          http://www.catb.org/~esr/faqs/smart-questions.html

          1 Reply Last reply
          0
          • F Offline
            F Offline
            Frank
            wrote on 5 Nov 2010, 10:54 last edited by
            #5

            [quote author="Volker" date="1288952811"]
            QTextStream out(&File);
            out.setCodec("UTF-8");
            QString x = QString::fromUtf8("مرحبا \n");
            out << x;
            @
            [/quote]

            That requires the source file to be UTF8 though. Which causes issues with MSVC, iirc. I find non-ascii source files a PITA when targeting multiple platforms.

            1 Reply Last reply
            0
            • G Offline
              G Offline
              goetz
              wrote on 5 Nov 2010, 10:59 last edited by
              #6

              We've changed all of our C++ sources to UTF-8 a while ago, no problems on Mac and MS Visual Studio. Of course we do use "true" UTF-8 strings only in character constants (const char*).

              We use

              @
              CODECFORTR = UTF-8
              CODECFORSRC = UTF-8
              @

              in our .pro files.

              http://www.catb.org/~esr/faqs/smart-questions.html

              1 Reply Last reply
              0
              • T Offline
                T Offline
                tony
                wrote on 5 Nov 2010, 11:02 last edited by
                #7

                I agree with Frank.

                This is the main reason why I prefer to "pre-encode" UTF-8 constant string and type in the code the escaped sequence of bytes. Mostly cause it's rare for me to have such strings (if you write everything in English and you translate later with Qt Linguist, you don't even need to care for that).

                T.

                1 Reply Last reply
                0
                • G Offline
                  G Offline
                  goetz
                  wrote on 5 Nov 2010, 11:32 last edited by
                  #8

                  If you have to write mostly 7 bit ASCII strings, then that is perfectly ok.

                  But I strictly refuse to code my German Umlauts (äöü) or french accented chars (é) in an ancient, stone age style. Come on - it's year 2010, we've landed on mars and we still should not enter more than 7bit chars in modern code? We can use our time better than looking up code points in tables :-) Not to mention readability of the source code!

                  http://www.catb.org/~esr/faqs/smart-questions.html

                  1 Reply Last reply
                  0
                  • T Offline
                    T Offline
                    tony
                    wrote on 5 Nov 2010, 11:38 last edited by
                    #9

                    You're right, Volker.

                    That's why I wrote a small Qt app with two QTextEdit, that codes strings for me. Just for coding one string, it's a copy-paste effort, that even in 2010 maybe it's worth trying :) :) .

                    If I need to write a sw in a different language than English, I prefer the Qt Linguist approach. So I'll have for free at least two languages :)

                    T.

                    1 Reply Last reply
                    0
                    • G Offline
                      G Offline
                      goetz
                      wrote on 5 Nov 2010, 11:43 last edited by
                      #10

                      An I just hit the 'ä'-key on my keyboard.

                      I really had no problem with UTF-8 source both on Windows and on Mac. Maybe we are lucky, because we're only using Qt (QString) for the string handling :-)

                      We do a big commercial project. Coding in English in the first place and translating afterwards is not an option, as our primary target are German customers.

                      http://www.catb.org/~esr/faqs/smart-questions.html

                      1 Reply Last reply
                      0
                      • P Offline
                        P Offline
                        Polto
                        wrote on 5 Nov 2010, 15:17 last edited by
                        #11

                        I think it is not about the data wich we write to file
                        the problem is ok and the character arabic is stored good ( just if i create a file by normal text editor and save it as utf-8 encode)
                        then when i open it from qt and pass characters to file it was ok

                        Polto

                        1 Reply Last reply
                        0
                        • G Offline
                          G Offline
                          goetz
                          wrote on 5 Nov 2010, 15:35 last edited by
                          #12

                          The data in your version is a const char * string. It's encoding depends on the settings of your editor. According to the docs the const char * is written out assuming ISO-8859-1 charset. If your source file is in UTF-8 (or any other encoding), your 2, 3 or 4 bytes that make up one single character in your string are considered a single latin-1 character and output as the according utf-8 characters.

                          If you have any other encoding than ISO-8859-1 in your const char * string constants, you must convert them to a QString with the appropriate conversion routines and/or text codecs.

                          Your

                          @
                          out << "مرحبا \n";
                          @

                          is internally converted in to a

                          @
                          d->putString(QLatin1String(string));
                          @

                          The output definitely cannot be what you expect...

                          http://www.catb.org/~esr/faqs/smart-questions.html

                          1 Reply Last reply
                          0
                          • W Offline
                            W Offline
                            windwaker
                            wrote on 22 May 2011, 13:16 last edited by
                            #13

                            Hello,

                            I did some playing arround with it and managed to find the missing link:
                            It seems that the file is initialized improperly. When you use @out.setCodec("UTF-8");@ it only sets the codec for the text (which means that after writing to the file the data in the file is encoded correctly), but the file is still not initialized properly.
                            So I have added the BOM information manually. Check out the below code:
                            @#define UTF8BOM "\xEF\xBB\xBF"
                            #define UTF8 "UTF-8"
                            #define UTF16LEBOM "\xFF\xFE"
                            #define UTF16LE "UTF-16LE"
                            #define UTF16BEBOM "\xFE\xFF"
                            #define UTF16BE "UTF-16BE"

                            QFile f( "test.txt" );
                            // Remove the file if it exists
                            if( f.exists() ) f.remove(f.fileName());
                            // Open the file in binary mode or exit if it fails
                            if( !f.open(f.ReadWrite) ) return -2;
                            f.write(UTF8BOM); // Write UTF-8 BOM to the start of the file
                            f.setTextModeEnabled(true); // Enable text mode
                            QTextStream out (&f);
                            out.setCodec(UTF8);
                            QString x = QString::fromUtf8("مرحبا \n");
                            out << x;
                            f.close();
                            return 0;@ However your source files still need to be encoded in UTF-8. If you want to save your files in different encoding (e.g. UTF-16) you can use other defined pairs. I have tested them and they also work.

                            Only posted this in case anyone else is searching for a quick solution to this problem.

                            Cheers!

                            1 Reply Last reply
                            0
                            • G Offline
                              G Offline
                              goetz
                              wrote on 22 May 2011, 15:51 last edited by
                              #14

                              Byte order marks are not required for valid UTF-8 files, and actually they are often discouraged to use. It's mainly a Windows "interpretation" of UTF-8, most unixoid OSes do not use them either. See wikipedias entry on "Byte Order Marks":http://en.wikipedia.org/wiki/Byte_Order_Mark for a quick overview.

                              http://www.catb.org/~esr/faqs/smart-questions.html

                              1 Reply Last reply
                              0
                              • W Offline
                                W Offline
                                windwaker
                                wrote on 22 May 2011, 16:24 last edited by
                                #15

                                We should remember, that windows is not a normal OS, so I guess this is a workarround for windows then...

                                Any other piece of code that actually works would be highly appreciated :)

                                1 Reply Last reply
                                0

                                • Login

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • Users
                                • Groups
                                • Search
                                • Get Qt Extensions
                                • Unsolved