Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

QRegularExpression with a character class behaving strangely



  • I'm trying to use QRegularExpression to extract Arabic text (0x600-0x6ff) from a string. Below is the sample code and output. (I've included duplicate code that picks out latin lowercase letter just to convince myself that the method should work.)
    @
    QRegularExpressionMatchIterator iter;
    QRegularExpression regex("([\x61-\x7a]+)");
    qDebug() << "lower case QRegularExpression pattern" << regex.pattern() << regex.isValid();

    iter = regex.globalMatch("the cow JUMPED over");
    while(iter.hasNext()) {
    QRegularExpressionMatch match = iter.next();
    if (match.isValid()) {
    qDebug() << match.capturedTexts();
    }
    }
    QRegularExpression arabic("([\x600-\x6ff]+)");;
    qDebug() << "arabic regex" << arabic.pattern() << arabic.isValid();
    iter = arabic.globalMatch("You say, كَتَبَ إِلَىَّ يَسْتَبْطِئُنِى He wrote ");
    while(iter.hasNext()) {
    QRegularExpressionMatch match = iter.next();
    if (match.isValid()) {
    qDebug() << match.capturedTexts();
    }
    }
    @

    Produces this:
    @
    lower case QRegularExpression pattern "([\x61-\x7a]+)" true
    ("the", "the")
    ("cow", "cow")
    ("over", "over")
    arabic regex "([\x600-\x6ff]+)" true
    ("Yo", "Yo")
    ("a", "a")
    ("He", "He")
    ("o", "o")
    ("e", "e")
    @

    I think it might be a clue that if I specify the lowercase regex pattern as ([\x061-\x07a]+) it considers it to be invalid.

    By the way, QRegExp with the arabic pattern works Ok.

    Any ideas how to specify character classes correctly?

    Thanks.


Log in to reply