Strange behavior of QString::indexOf (with QRegExp)



  • I have the following test string:
    Abcdefgh/Ijklmn - Opqrstuv__wxyz
    I have the following RegExp:

    QRegExp regExp;
    regExp.setPatternSyntax(QRegExp::RegExp2);
    regExp.setPattern("[_-./ ]");
    

    Then I do the following call to indexOf:

    int matchPosition = myString.indexOf(regExp, 0);

    ...and receive 0, where I expect 9.

    Any ideas where I might have gone wrong?



  • I think you need to escape the "[_-./ ]" as follow:

    regExp.setPattern("[_\\-./ ]");
    


  • Seems like QRegExp::escape should help here. Thanks for the pointer!



  • Ok, so QRegExp::escape changes my pattern to
    [_-\./ ]

    Position is still zero.
    BTW. why would you put two backslashes in front of the hyphen?

    EDIT: It works after all, recompile helped.
    EDIT2: Correction: It only works when starting the indexOf at position 1, not at position 0. What's so special about the first character ('A')?



  • @Asperamanca said:

    QRegExp::escape changes my pattern to [_-\./ ]

    So it seems like my guess that it was an escape issue was correct, just the char was wrong :)

    why would you put two backslashes in front of the hyphen?

    So you need to escape the special characters, so in effect you want \., so if you pass that as a QString you end up with escaping the . for the string, not the regular expression, so you need two so that \\.gets seen by the QString as \. (So you escape the slash because the string should get a slash char).

    I hope you understand the explanation, its a bit difficult to explain.



  • What version of Qt are you using?

    For Qt 5, rather start using QRegularExpression instead of QRegExp (motivation)

    What's so special about the first character

    That is strange



  • To make sure, I removed the dot from my regexp pattern. All remaining characters shouldn't need escaping.
    Problem persists, so it doesn't seem to be an escaping issue.

    Edit: Qt 4.8.1



  • It is useful to test your regular expressions on a website that does matching, I use the following one: http://regexr.com/

    On that website the issue is that the - needed to be escaped...

    Edit: That website handles more complex matching than what QRegExp support, but for basic matching it is useful to test like that (then remember proper escaping)



  • Interesting. Here is a list of characters needing to be escaped, according to my QRegExp sources:

            case '$':
            case '(':
            case ')':
            case '*':
            case '+':
            case '.':
            case '?':
            case '[':
            case '\\':
            case ']':
            case '^':
            case '{':
            case '|':
            case '}':
    

    No hyphen...but I can remove that as well, for the test.

    EDIT: And here we have the explanation:
    Qt bug



  • Reading that report and thinking about it it makes sense that it is not an actual bug.

    For [a-z] it should not be escaped, since you want to match all characters from a up to character z.

    For [a\-z] you want to match character a, or character hyphen or character z.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.