QString character access problem
I'd like to access individual characters in a QString. However, for certain characters the QString's size is increased by 2 instead of 1. I tried to include one of these characters into my post, but the forum only displays my text up to this character. One example is found "here.":http://www.fileformat.info/info/unicode/char/2000b/index.htm
(they are rarely used Kanji characters from the Chinese/Japanese language). It seems like they cannot be represented by a single QChar.
My question is: How do I detect this case and how can I correctly access the character?
Thanks in advance for your help.
Hi, welcome to devnet.
You can access each character either by the  operator, the at() member function or via pointer returned by data().
Each of those give you access to the QChar data type.
What you say sounds strange. Can you elaborate? Do you mean what the size() member function returns or the c++ sizeof() operator? How do you add characters? Operator +, append(), arg(), insert() or something else?
Hi and welcome to devnet,
How are you currently using QString to access the individual characters ?
According to "this (old, I admit) post":http://blog.qt.digia.com/blog/2008/04/28/string-theory/#comment-2064 by Thiago QChar represents a single UTF-16 word so it's true that some characters may require two of them and this can lead to problems and requires special care.
For example the character you mentioned:
unsigned int foo = 0x2000b;
QString bar = QString::fromUcs4(&foo, 1);
qDebug() << bar.size(); //this prints 2 :(
qDebug() << bar.toUcs4().size(); //this prints 1 correctly
Thanks a lot for your quick replies!
I tried to access the individual character using QString::at(). Both at(0) and at(1) returned something that was printed as square in a QLineEdit. When I pasted these characters into the text editor of this forum, a code was shown in the square. After removing the space between these two characters, they were fused to the single one described the link in my first post.
QString::size() returned 2. I encountered this while parsing the kanjidic2.xml file, which contains information on letters of the Japanese language. The QString causing this problem was initialized using QXmlStreamReader::readElementText(). I assumed the string size to be 1, as the respective xml field contains only a single literal. Your second post helped me a lot, thank you.
I thought that a single QChar was able to represent any unicode character, but the first line of its documentation already restricts it to UTF-16.
I think that the documentation of QString::size() and QString::at() should point to this issue somehow. If I would have assumed my QString's size to be always one (I was extracting single literals after all) and hadn't verified this, it would have caused an extremely well hidden bug in my program.