QRegExp, perl and Unicode
-
According to the "documentation for QRegExp":http://developer.qt.nokia.com/doc/qt-4.7/qregexp.html, it "is modeled on Perl's regexp language. It fully supports Unicode."
However, Perl's regexp ("documented here":http://perldoc.perl.org/perlre.html) specifies support for the special escape \N{name} to match a named Unicode character or character sequence, e.g. \N{KELVIN SIGN},
and also for \p{property} and \P{property} to match or not-match numerous Unicode properties "listed here":http://perldoc.perl.org/perluniprops.html#Properties-accessible-through-\p{}-and-\P{}.
I do not see either \p or \N mentioned in the QRegExp doc page. Are they there but not documented? Are these features (named characters and properties) supported in some other way?
-
No. They're simply not there. QRegExp supports a (minimal) subset of Perl's regexps, and it's not even PCRE compatible (f.i. there are no non-greedy operators).
You can work around the lack of \N support by (sigh...) using \x. Unfortunately not only there's no direct equivalent of the \p escape, but the information provided by QChar are not enough to provide a workaround.
The best I can suggest is to dump QRegExp and using libpcre to do your matches.
-
Because I'm working in PyQt, libpcre is not available. Python native re support also lacks \p\N and has other Unicode deficiencies. However there is a good extension regex package ("regex":http://pypi.python.org/pypi/regex) with rather complete Unicode support.
The difficulty that I see as a [Py]Qt newbie is in working on the one hand with a QPlainTextEditor and text cursor objects, and on the other with Python-based regex matching. Constantly crossing between the world of the editor document and the world of Python u"strings" looks like a very fruitful way to create confusion and mistakes. Comment?