Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Convert QString into QByteArray as either UTF-8 or Latin1
QtWS25 Last Chance

Convert QString into QByteArray as either UTF-8 or Latin1

Scheduled Pinned Locked Moved General and Desktop
9 Posts 4 Posters 68.0k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    jsiei97
    wrote on 13 Mar 2011, 11:11 last edited by
    #1

    Hi

    I would like to covert a QString into either a utf8 or a latin1 QByteArray, but today I get everything as utf8.

    And I am testing this with some char in the higher segment of latin1 higher than 0x7f, where the german ü is a good example.

    If I do like this:

    @QString name("\u00fc"); // U+00FC = ü
    QByteArray utf8;
    utf8.append(name);
    qDebug() << "utf8" << name << utf8.toHex();

    QByteArray latin1;
    latin1.append(name.toLatin1());
    qDebug() << "Latin1" << name << latin1.toHex();

    QTextCodec *codec = QTextCodec::codecForName("ISO 8859-1");
    QByteArray encodedString = codec->fromUnicode(name);
    qDebug() << "ISO 8859-1" << name << encodedString.toHex();
    @

    I get the following output.

    @utf8 "ü" "c3bc"
    Latin1 "ü" "c3bc"
    ISO 8859-1 "ü" "c3bc" @

    As you can see I get the unicode 0xc3bc everywhere, where I would expect to get the Latin1 0xfc for step 2 and 3.

    What is going on here?

    /Thanks

    1 Reply Last reply
    0
    • J Offline
      J Offline
      jsiei97
      wrote on 13 Mar 2011, 13:06 last edited by
      #2

      If I set local correctly it works...

      @QTextCodec::setCodecForLocale(QTextCodec::codecForName("UTF-8"));
      QTextCodec::setCodecForCStrings(QTextCodec::codecForName("UTF-8"));@

      1 Reply Last reply
      0
      • G Offline
        G Offline
        goetz
        wrote on 13 Mar 2011, 17:20 last edited by
        #3

        Anything wrong with "QString::toUtf8() ":http://doc.qt.nokia.com/4.7/qstring.html#toUtf8 or "QString::toLatin1() ":http://doc.qt.nokia.com/4.7/qstring.html#toLatin1? :-)

        http://www.catb.org/~esr/faqs/smart-questions.html

        1 Reply Last reply
        0
        • J Offline
          J Offline
          jsiei97
          wrote on 13 Mar 2011, 19:13 last edited by
          #4

          It was QString::toLatin1 that did not work!

          And the string was already utf8, so I could not convert it again using toUtf8.
          QString and QByteArray just got the stupid idea that the data was latin1 and not utf8.

          So the only remedy that I could find was to make sure the system really knew what format the data was stored in to begin with. Therefore the setCodecForLocale and setCodecForCStrings.

          1 Reply Last reply
          0
          • R Offline
            R Offline
            raulgd
            wrote on 13 Mar 2011, 19:40 last edited by
            #5

            Hi, I had this issue before, you would have to convert to a char and pass the length using QString::fromUtf8() so I did this:

            @
            //An UTF8 encoded QByteArray
            QByteArray aByteArray = aString.c_str();

            //From an UTF8 encoded QByteArray to a QString
            QString aQString = QString::fromUtf8(aByteArray.data(), aByteArray.size());
            @

            Raul Guerrero
            http://jimi.mx

            1 Reply Last reply
            0
            • G Offline
              G Offline
              goetz
              wrote on 13 Mar 2011, 21:54 last edited by
              #6

              It works with 0-terminated strings (standard char *) without knowing the length. The data length is computed with qstrlen() then. So, it's perfectly ok to write:

              @
              QString x = QString::fromLatin1("ü");
              @

              @jsiei97: Your issue is not an output problem but an input problem. setCodecForLocale and setCodecForCStrings only affect the construction of QStrings. These settings are useful if you want to use native strings throughout your source code. If you need this only here and then, the static methods QString::fromLatin1() and QString::fromUtf8() could be enough.

              http://www.catb.org/~esr/faqs/smart-questions.html

              1 Reply Last reply
              0
              • J Offline
                J Offline
                jsiei97
                wrote on 14 Mar 2011, 06:27 last edited by
                #7

                In this case I must have the data as a QByteArray at the end (for other reasons).

                But since my system is using utf8, I can't see any danger with telling him what he is using.

                I know there could be some portability issues if we move this sw to another platform.
                Today this code will only run on a specific embedded Linux device, so this solutions moves my trouble into the future.... (or not)

                However my current guess is that the correct solution is to find the missconfiguration in the system environment, so I don't need to hardcode the default locale in the code it self...

                1 Reply Last reply
                0
                • G Offline
                  G Offline
                  goetz
                  wrote on 14 Mar 2011, 11:18 last edited by
                  #8

                  This is no problem. You create a QString object, that does all the codec stuff and stores the string as unicode (utf-16 if I remember correctly) internally. After you have kind-of normalized the string this way, you can get a new byte representation in the encoding you want. It's an easy and convenient way to convert from latin1 to utf8, for example.

                  You are doing nothing different in your code, but only leave the decision which codec to use to the system. With the static methods you can tell Qt directly.

                  Once you have compiled the sources, it is irrelevant what encoding the system uses you deploy on. The decision has already been made. The encoding stuff is also platform independent on a compiler view - it works on any platform. We do UTF-8 encoded sources (including german umlauts) on Windows, Mac an Linux without any problem. But be sure to tell all your editors to use the same encoding (including default encoding when saving new files) :-)

                  http://www.catb.org/~esr/faqs/smart-questions.html

                  1 Reply Last reply
                  0
                  • D Offline
                    D Offline
                    dangelog
                    wrote on 14 Mar 2011, 12:20 last edited by
                    #9

                    [quote author="jsiei97" date="1300014687"]Hi

                    I would like to covert a QString into either a utf8 or a latin1 QByteArray, but today I get everything as utf8.[/quote]

                    There are some problems with your snippet...

                    • QString(const char *) uses whatever codec was set by QTextCodec::setCodecForCStrings(), or if no codec was set, fromLatin1()
                    • A \u escape sequence is not generated in any particular encoding, but it's up to your compiler to set the execution charset (see -fexec-charset on gcc). For instance:
                      @
                      $ LC_ALL=C gcc -x c++ -o - -S - -fexec-charset=latin1 <<< 'const char *foo = "\u00fc";' | grep .string
                      .string "\374"
                      $ LC_ALL=C gcc -x c++ -o - -S - -fexec-charset=utf8 <<< 'const char *foo = "\u00fc";' | grep .string
                      .string "\303\274"
                      $ LC_ALL=C gcc -x c++ -o - -S - -fexec-charset=utf16 <<< 'const char *foo = "\u00fc";' | grep .string
                      .string "\377\376\374"
                      @

                    This means that what ends up in your char array that you pass to QString ctor is pretty much up to your compiler, may change for every translation unit and may be out of your control (load a plugin that changes the codec for the C strings => doom).
                    Therefore, stay on the safe side: don't use \u inside strings unless you are 100% sure of the WHOLE toolchain, locale set by the user, etc; use ascii characters only in the source file; use the \x escape sequence instead. In any case, use QString::fromUtf8/Latin1/Utf16 inside your program, and if possible, shut down all unsafe conversions from/to C strings by defining QT_NO_CAST_FROM_ASCII and QT_NO_CAST_TO_ASCII.

                    • QByteArray::append(QString) uses toAscii on the string, which again uses the codec for c strings, otherwise converts to latin1. If you want to convert to utf8, use toUtf8.

                    • Watch out, qDebug() may be not unicode safe. Always check with toUtf8().toHex() what's really inside your strings.

                    Software Engineer
                    KDAB (UK) Ltd., a KDAB Group company

                    1 Reply Last reply
                    0

                    9/9

                    14 Mar 2011, 12:20

                    • Login

                    • Login or register to search.
                    9 out of 9
                    • First post
                      9/9
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • Users
                    • Groups
                    • Search
                    • Get Qt Extensions
                    • Unsolved