Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Discover and share your #QtStories

Upcoming Forum Update April 22nd

[SOLVED] Need help with regexp for Kanji

Japanese

2

3

4528

Log in to reply

V
vsorokin last edited by

I need check string for Kanji symbols. Can anybody help me build regexp for this?

Thanks.
--
Vasiliy
1 Reply Last reply
Reply Quote 0
T
takumiasaki last edited by
"Unicode Chapter 12":http://www.unicode.org/versions/Unicode5.0.0/ch12.pdf will help you a lot.

|CJK Unified Ideographs|4E00–9FFF|Common|
|CJK Unified Ideographs Extension A|3400–4DBF|Rare|
|CJK Unified Ideographs Extension B|20000–2A6DF|Rare, historic|
|CJK Unified Ideographs Extension C|2A700–2B73F|Rare, historic|
|CJK Unified Ideographs Extension D|2B740–2B81F|Uncommon, some in current use|
|CJK Compatibility Ideographs|F900–FAFF|Duplicates, unifiable variants, corporate
characters|
|CJK Compatibility Ideographs Supplement|2F800–2FA1F|Unifiable variants|

So, range of Kanji(Han) are very roughly U+3400-U+9FFF, U+F900-U+FAFF, and U+20000-U+2FFFF.

QRegExp:
@
QRegExp isHan("([\x3400-\x9FFF\xF900-\xFAFF]|[\xD840-\xD87F][\xDC00-\xDFFF])+");
@

Note: This regexp(isHan) doesn't contain CJK Symbols(U+3000 - U+303F), Hiragana(U+3041 - U+309F), or Katakana(U+30A0 - U+30FF).
- "CJK Symbols and Punctuation":http://www.unicode.org/charts/PDF/U3000.pdf
- "Hiragana":http://www.unicode.org/charts/PDF/U3040.pdf
- "Katakana":http://www.unicode.org/charts/PDF/U30A0.pdf
If you would like to check them, please add them to regexp.
1 Reply Last reply
Reply Quote 0
V
vsorokin last edited by

Thank you, for fast and good answer.
--
Vasiliy
1 Reply Last reply
Reply Quote 0

1 / 1