Skip to content
QtWS25 Call for Papers
  • 0 Votes
    14 Posts
    7k Views
    sdjskrS

    @Chris-Kawa

    The code point is also composed of characters, so code point character could be used to refer to the code point. Human is not supposed to speak only words in the dictionary. We are not a robot.

    Technically, each code point in UTF-16 is basically 2 bytes(16bit) unit. 4 bytes code point actually holds lead bytes and tail bytes. Still the basic unit is 2 bytes. And the 4 bytes unit is assigned to rarely used characters, which means we don’t need to care about the 4 bytes code point in UTF-16.

    So, UTF-16 is uniformly 2 bytes does make sense.

    @Chris Kawa said:

    “That's basically what I said. A QChar represents every 2 bytes (i.e. code point) of UTF-16. It doesn't mean a QChar represents a character. For some it will, for some it's just a half of a character.”

    If Korean characters are 4-byte code points, that’s reasonable. But every Korean characters are 2-byte code points. QChar shows the same 2 bytes code differently. It shows ‘a’ but not ‘ㄱ’.

    For Latin letter it works, for Korean letter it works not.

    The funny thing is that QChar itself lacks in ability to convert each encoding while it gets the job done inside QString by using some functions.