QRegularExpression with a character class behaving strangely
-
I'm trying to use QRegularExpression to extract Arabic text (0x600-0x6ff) from a string. Below is the sample code and output. (I've included duplicate code that picks out latin lowercase letter just to convince myself that the method should work.)
@
QRegularExpressionMatchIterator iter;
QRegularExpression regex("([\x61-\x7a]+)");
qDebug() << "lower case QRegularExpression pattern" << regex.pattern() << regex.isValid();iter = regex.globalMatch("the cow JUMPED over");
while(iter.hasNext()) {
QRegularExpressionMatch match = iter.next();
if (match.isValid()) {
qDebug() << match.capturedTexts();
}
}
QRegularExpression arabic("([\x600-\x6ff]+)");;
qDebug() << "arabic regex" << arabic.pattern() << arabic.isValid();
iter = arabic.globalMatch("You say, كَتَبَ إِلَىَّ يَسْتَبْطِئُنِى He wrote ");
while(iter.hasNext()) {
QRegularExpressionMatch match = iter.next();
if (match.isValid()) {
qDebug() << match.capturedTexts();
}
}
@Produces this:
@
lower case QRegularExpression pattern "([\x61-\x7a]+)" true
("the", "the")
("cow", "cow")
("over", "over")
arabic regex "([\x600-\x6ff]+)" true
("Yo", "Yo")
("a", "a")
("He", "He")
("o", "o")
("e", "e")
@I think it might be a clue that if I specify the lowercase regex pattern as ([\x061-\x07a]+) it considers it to be invalid.
By the way, QRegExp with the arabic pattern works Ok.
Any ideas how to specify character classes correctly?
Thanks.