QRegularExpression with a character class behaving strangely



  • I'm trying to use QRegularExpression to extract Arabic text (0x600-0x6ff) from a string. Below is the sample code and output. (I've included duplicate code that picks out latin lowercase letter just to convince myself that the method should work.)
    @
    QRegularExpressionMatchIterator iter;
    QRegularExpression regex("([\x61-\x7a]+)");
    qDebug() << "lower case QRegularExpression pattern" << regex.pattern() << regex.isValid();

    iter = regex.globalMatch("the cow JUMPED over");
    while(iter.hasNext()) {
    QRegularExpressionMatch match = iter.next();
    if (match.isValid()) {
    qDebug() << match.capturedTexts();
    }
    }
    QRegularExpression arabic("([\x600-\x6ff]+)");;
    qDebug() << "arabic regex" << arabic.pattern() << arabic.isValid();
    iter = arabic.globalMatch("You say, كَتَبَ إِلَىَّ يَسْتَبْطِئُنِى He wrote ");
    while(iter.hasNext()) {
    QRegularExpressionMatch match = iter.next();
    if (match.isValid()) {
    qDebug() << match.capturedTexts();
    }
    }
    @

    Produces this:
    @
    lower case QRegularExpression pattern "([\x61-\x7a]+)" true
    ("the", "the")
    ("cow", "cow")
    ("over", "over")
    arabic regex "([\x600-\x6ff]+)" true
    ("Yo", "Yo")
    ("a", "a")
    ("He", "He")
    ("o", "o")
    ("e", "e")
    @

    I think it might be a clue that if I specify the lowercase regex pattern as ([\x061-\x07a]+) it considers it to be invalid.

    By the way, QRegExp with the arabic pattern works Ok.

    Any ideas how to specify character classes correctly?

    Thanks.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.