Solved : Reading ASCII/UTF-8 file



  • Hi all,

    I am using Qt to read a file generated by the method store of the Java Properties class. This method generates an ASCII file. However, special characters are written in their Utf-8 equivalent. In the end, the ascii file looks like that :

    firstname = g\u00C9rard
    lastname = normand

    Now, when I read and display the data, I would like to display the symbols itself (gérard normand). I tried using the toUtf8() method but nothing convincing came out. So how do I read this ascii file while handling the Utf-8 symbols ?

    Thanks for your help

    Thibaut



  • The format you describe isn't UTF-8, but an escaping sequence. You'll need to parse those. AFAIK, there is no standard Qt codec that deals with these. You could create your own though. See [[doc:QTextCodec]] for more information on that.



  • Hi, the \u00C9 is a utf32 representation of a capital letter é, so getting the small letter is going to be trouble some. Think as Andre mentioned you need to make your own conversion class to handle this one.



  • Thanks for the reply. The capital letter é is what I want. I went on http://www.fileformat.info/info/unicode/char/c9/index.htm and it says that \u00C9 is Java/C++ source code for the capital é character. So I was thinking there might be a way around it.



  • Finally found what I was looking for.
    I need to use the following routine

    @QRegExp rx("(\\u[0-9a-fA-F]{4})");
    int pos = 0;
    while ((pos = rx.indexIn(str, pos)) != -1) {
    str.replace(pos++, 6, QChar(rx.cap(1).right(4).toUShort(0, 16)));
    }@

    Thanks a lot for your replies


  • Moderators

    Did you check out "fromUnicode?":http://qt-project.org/doc/qt-4.8/qtextcodec.html#fromUnicode
    At least from the name and description it seems to be fitting. However, Andre may have more experience with this.



  • [quote author="koahnig" date="1342687524"]Did you check out "fromUnicode?":http://qt-project.org/doc/qt-4.8/qtextcodec.html#fromUnicode
    At least from the name and description it seems to be fitting. However, Andre may have more experience with this. [/quote]

    The problem is that the codec used is not a codec in the normal sense. It is a unicode escape sequence in an otherwise ASCII-encoded file. So to use this method, you first have to actually implement a codec that does that translation back and forth. If you have to work with these files, it is probably a good idea to implement such a codec. Doesn't seem all that hard to me...

    Edit: though, I admit, I did not try doing it myself, so it might be harder than it seems by just looking at the docs...


  • Moderators

    [quote author="Andre" date="1342688046"]
    [quote author="koahnig" date="1342687524"]Did you check out "fromUnicode?":http://qt-project.org/doc/qt-4.8/qtextcodec.html#fromUnicode
    At least from the name and description it seems to be fitting. However, Andre may have more experience with this. [/quote]

    The problem is that the codec used is not a codec in the normal sense. It is a unicode escape sequence in an otherwise ASCII-encoded file. So to use this method, you first have to actually implement a codec that does that translation back and forth. If you have to work with these files, it is probably a good idea to implement such a codec. Doesn't seem all that hard to me...
    [/quote]

    Andre, thanks for clarification


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.