Qt Forum

    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Unsolved

    [SOLVED] Need help with regexp for Kanji

    Japanese
    2
    3
    4494
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • V
      vsorokin last edited by

      I need check string for Kanji symbols. Can anybody help me build regexp for this?

      Thanks.

      --
      Vasiliy

      1 Reply Last reply Reply Quote 0
      • T
        takumiasaki last edited by

        "Unicode Chapter 12":http://www.unicode.org/versions/Unicode5.0.0/ch12.pdf will help you a lot.

        |CJK Unified Ideographs|4E00–9FFF|Common|
        |CJK Unified Ideographs Extension A|3400–4DBF|Rare|
        |CJK Unified Ideographs Extension B|20000–2A6DF|Rare, historic|
        |CJK Unified Ideographs Extension C|2A700–2B73F|Rare, historic|
        |CJK Unified Ideographs Extension D|2B740–2B81F|Uncommon, some in current use|
        |CJK Compatibility Ideographs|F900–FAFF|Duplicates, unifiable variants, corporate
        characters|
        |CJK Compatibility Ideographs Supplement|2F800–2FA1F|Unifiable variants|

        So, range of Kanji(Han) are very roughly U+3400-U+9FFF, U+F900-U+FAFF, and U+20000-U+2FFFF.

        QRegExp:
        @
        QRegExp isHan("([\x3400-\x9FFF\xF900-\xFAFF]|[\xD840-\xD87F][\xDC00-\xDFFF])+");
        @

        Note: This regexp(isHan) doesn't contain CJK Symbols(U+3000 - U+303F), Hiragana(U+3041 - U+309F), or Katakana(U+30A0 - U+30FF).

        • "CJK Symbols and Punctuation":http://www.unicode.org/charts/PDF/U3000.pdf
        • "Hiragana":http://www.unicode.org/charts/PDF/U3040.pdf
        • "Katakana":http://www.unicode.org/charts/PDF/U30A0.pdf

        If you would like to check them, please add them to regexp.

        1 Reply Last reply Reply Quote 0
        • V
          vsorokin last edited by

          Thank you, for fast and good answer.

          --
          Vasiliy

          1 Reply Last reply Reply Quote 0
          • First post
            Last post