Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Effective Qt Strings



  • I'm trying to write a simple parser, but QString has me a bit confused.
    In Java, char is Unicode, String is a simple wrapper over char[] and you can iterate with charAt(pos). No fuss.

    QString's at() and [] wrap each character in a QChar object that I don't need, so should I use utf16()?

    @
    const ushort *QString::utf16() const
    {
    if (d->data != d->array) {
    ... realloc() ...
    }
    return d->array;
    }
    @

    It seems to return a pointer to the ushort array QString uses internally, but I'm not sure what would
    cause that realloc() and copying in that if statement?

    Should/could I ditch QString altogether and load the text file as ushort* somehow?
    Thanks.



  • If you want to access each single character, use at() or []. QChar is a wrapper of a unicode char.

    But how do you want to parse? Character by Character? or searching? word by word?

    That has influence on how to do it....



  • Depends on what you want to do. In general, I would strongly recommend sticking to QString/QChar since then Qt handles all the unicode stuff for you and you will not have to deal with all the endianess stuff and the like.

    If you really need the code point, "QChar":http://doc.qt.nokia.com/latest/qchar.html has "unicode() ":http://doc.qt.nokia.com/latest/qchar.html#unicode methods that return you the ushort value.

    And no, you almost never want to deal with the encoding/decoding stuff of textfiles yourself when your fine library provides you convenient means for this.



  • QChar is a wrapper of a unicode char.

    Yea, that's my issue. Since QChar just holds a ushort, why can't I just get it from QString?
    Ironically, it seems weird to do all that boxing coming from Java.



  • [quote author="reactive" date="1296747071"]> QChar is a wrapper of a unicode char.

    Yea, that's my issue. Since QChar just holds a ushort, why can't I just get it from QString?
    Ironically, it seems weird to do all that boxing coming from Java.[/quote]

    Because Java strings are dumb :-)

    Java's char only contains the character value, much like C/C++'s char with the only improvement that it can contain unicode chars, not only 8bit like in C/C++.

    QChar, on the other hand, provides much more information about the character (which may not be of interest in a plain GUI-agnostic language like Java).

    As stated in the API docs of QChar, it's lightweight and does not create any overhead, thus does not harm the performance or memory print of your application.



  • if you need to compare during parsing, you could do it this way, and I think, it's not with overhead (only in chars to type :-) ):

    @
    QString myString = ...;
    for(int i = 0; i < myString.length(); ++i)
    {
    QChar c = myString.at(i);
    if(c == QChar('a'))
    ....
    }
    @



  • does not create any overhead, thus does not harm the performance or memory print of your application.

    Java Strings in source are interned. If the above is true does that mean this is equivalent:

    @
    if(text[0] == QUOTE) if(text[0] == QChar('"'))
    @

    or would it create a new QChar every time?

    Thanks for both your help! QChar it is.

    [EDIT: code formatting, please use @-tags, Volker]



  • Itv will create a new object as you request it.
    But it will be fast
    And even faster if you use:

    @
    if(text[0] == QChar(L'"'))
    @



  • Haha, weird! Is that a C++ thing or a Qt thing? What does adding the letter "L" do?

    EDIT:
    "A character literal that begins with the letter L, such as L’x’, is a wide-character literal. A wide-character literal has type wchar_t"



  • Thats C++

    L'x' --> unicode char, or wchar_t
    'x' --> ASCII char



  • [quote author="reactive" date="1296747811"]> does not create any overhead, thus does not harm the performance or memory print of your application.

    Java Strings in source are interned. If the above is true does that mean this is equivalent:

    @
    if(text[0] == QUOTE) if(text[0] == QChar('"'))
    @

    or would it create a new QChar every time?

    Thanks for both your help! QChar it is.[/quote]

    Depends on the compiler, but to be on the safe side, assume creating a new object. Although it is constructed on the stack and destroyed as soon as it goes out of scope, no memory penalty in this. You definitely can avoid this by allocating static objects with your constants.



  • Java's java.lang.String and Qt's QStrings use very similar internals. Both are Unicode-compliant and encoded in UTF-16, thus require surrogate pairs to be able to represent code points outside the BMP.

    The little problem is that Java "char" is defined as a UTF-16 encoded code unit, therefore 16 bit wide, while in the C/C++ world a char has an unspecified number of bits given by the CHAR_BIT define (usually: 8).

    That's why in Qt you have QChar, which is nothing more than a UTF-16 code unit.

    More info:
    http://en.wikipedia.org/wiki/UTF-16/UCS-2
    http://doc.qt.nokia.com/latest/qstring.html#details
    http://doc.qt.nokia.com/latest/qchar.html#details
    http://download.oracle.com/javase/6/docs/api/java/lang/Character.html#unicode
    http://download.oracle.com/javase/6/docs/api/java/lang/CharSequence.html
    http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp#core-textrep


Log in to reply