QRegularExpression unreliable results with RegEx piping (Qt 5.5.1 Windows)



  • I try to use QRegularExpression to get reliable results to find one or two digits year and month in a string. The format inside the string is not always the same, i need to cover a few variants so i gave piping a chance. So far without much luck. For example:

    QString string1 = "Find my Date 15x01"; // Year 2015,  January
    QString string2 = "Find Y15M01 Date"; // Year 2015, January
    QString string3 = "Find y03m11 date"; // Year 2003, November
    QString string4 = "Find Date 1x9"; // Year 2001, September
    QString string5 = "Find year-16_month-09"; // Year 2016, September
    

    And now the RegEx part:

    QRegularExpression re("(\\d+)[xX](\\d+)");
    QRegularExpressionMatch match = re.match(string1);
    

    Works as expected. The first RegEx always works! But if i try to pipe and not the first RegEx is a match, the results are there but not in groups as expected i can work with.

    QRegularExpression re("(\\d+)[xX](\\d+)|[yY](\\d+)[mM](\\d+)");
    QRegularExpressionMatch match = re.match(string2);
    

    or

    QRegularExpression re("(\\d+)[xX](\\d+)|[yY](\\d+)[mM](\\d+)|[year-](\\d+)[_month-](\\d+)");
    QRegularExpressionMatch match = re.match(string5);
    

    Is it me or my or my RegEx? I did test the same on Python and it did work as expected. So far no luck in Qt. If i use just one RegEx and or the first RegEx ist a match, it works perfectly. But if the RegEx is 2nd or 3rd or , ... in the pipe, it is still a match but no groups i can work with. I always expect 1 to be the year and 2 for the month.

    r_year = match.captured(1);
    r_month = match.captured(2);
    

    Thanks!


  • Qt Champions 2016

    @qDebug
    Hello,
    Well, I can't say for sure why that might be, however you could use another regex that will have only two groups (without the alternatives), similar to this:

    \b(?:[yY]|year-)?(\d+)(?:[xX]|[mM]|_month-)(\d+)\b
    

    Still, you do seem to have an error in your pattern, namely: [year-] and [_month-].

    Kind regards.



  • Using just one RegEx is not an option in my project, because i want to integrate the ability to add and remove RegEx patterns. The only way i can think of right now is a QStringList and foreach through all the patterns i single stored and then if/else check the captured resutls 1 and 2.

    But i don't think it is an ideal solution to loop through 20 or so RegEx pattern and maybe 2.000 files. That could end up in 40.000 or even more requests.


  • Qt Champions 2016

    @qDebug

    Hello,
    Your original, but corrected regex (\d+)[xX](\d+)|[yY](\d+)[mM](\d+)|year-(\d+)_month-(\d+) checks out and captures everything (with global matching), I've run my simple test here. Unfortunately, it does that in groups from 1 to 6, so my guess is that it's not a Qt specific issue, but how the PCRE engine actually works.

    As a side note your regular expression uses about 2 times the steps mine does. I strongly suspect this is because of the branching induced by the alternatives specifier that you use,

    Kind regards.



  • You are right, i believe it is my lack of knowledge and not a Qt problem. I guess i can handle most or maybe all variants in one single RegEx. So far i only run into one problem if the year and month is one 4 digit number.

    QString string6 = "Find 0509 or not"; // Year 2005, September
    

    My RegEx will work with this example, because it picks the last of 4 digits, the 9. But if the month is 10, 11 or 12, it will results in 0, 1, and 2.

    QString string7 = "Find 0511 or not"; // Year 2005, November
    QString string8 = "Find 511 or not"; // Year 2005, November
    QString string9 = "Find 51 or not"; // Year 2005, January
    
    QRegularExpression re("((?:[yY]|[a^])?(\\d{1,2})(?:[xX]|[mM]|[\\d{4}])(\\d{1,2})";
    QRegularExpressionMatch match = re.match(string7);
    

    But that's also is a RegEx problem.

    Thanks.


  • Qt Champions 2016

    @qDebug
    Hello,
    My original suggestion could be modified to (mostly) suit that case as well, like this:

    \b(?:[yY]|year-)?(\d{1,2})(?:[xX]|[mM]|_month-)?(\d{1,2})\b
    

    However to have it work you have to switch the matching greediness. So you'd use it as follows:

    QRegularExpression rx("\\b(?:[yY]|year-)?(\\d{1,2})(?:[xX]|[mM]|_month-)?(\\d{1,2})\\b", QRegularExpression::InvertedGreedinessOption | QRegularExpression::OptimizeOnFirstUsageOption);
    // ... Use repeatedly as usual ...
    QRegularExpressionMatch match = rx.match(someString);
    

    You can check the results of the pattern here.

    Kind regards.


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.