Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

QRegularExpression unreliable results with RegEx piping (Qt 5.5.1 Windows)



  • I try to use QRegularExpression to get reliable results to find one or two digits year and month in a string. The format inside the string is not always the same, i need to cover a few variants so i gave piping a chance. So far without much luck. For example:

    QString string1 = "Find my Date 15x01"; // Year 2015,  January
    QString string2 = "Find Y15M01 Date"; // Year 2015, January
    QString string3 = "Find y03m11 date"; // Year 2003, November
    QString string4 = "Find Date 1x9"; // Year 2001, September
    QString string5 = "Find year-16_month-09"; // Year 2016, September
    

    And now the RegEx part:

    QRegularExpression re("(\\d+)[xX](\\d+)");
    QRegularExpressionMatch match = re.match(string1);
    

    Works as expected. The first RegEx always works! But if i try to pipe and not the first RegEx is a match, the results are there but not in groups as expected i can work with.

    QRegularExpression re("(\\d+)[xX](\\d+)|[yY](\\d+)[mM](\\d+)");
    QRegularExpressionMatch match = re.match(string2);
    

    or

    QRegularExpression re("(\\d+)[xX](\\d+)|[yY](\\d+)[mM](\\d+)|[year-](\\d+)[_month-](\\d+)");
    QRegularExpressionMatch match = re.match(string5);
    

    Is it me or my or my RegEx? I did test the same on Python and it did work as expected. So far no luck in Qt. If i use just one RegEx and or the first RegEx ist a match, it works perfectly. But if the RegEx is 2nd or 3rd or , ... in the pipe, it is still a match but no groups i can work with. I always expect 1 to be the year and 2 for the month.

    r_year = match.captured(1);
    r_month = match.captured(2);
    

    Thanks!


  • Qt Champions 2017

    @qDebug
    Hello,
    Well, I can't say for sure why that might be, however you could use another regex that will have only two groups (without the alternatives), similar to this:

    \b(?:[yY]|year-)?(\d+)(?:[xX]|[mM]|_month-)(\d+)\b
    

    Still, you do seem to have an error in your pattern, namely: [year-] and [_month-].

    Kind regards.



  • Using just one RegEx is not an option in my project, because i want to integrate the ability to add and remove RegEx patterns. The only way i can think of right now is a QStringList and foreach through all the patterns i single stored and then if/else check the captured resutls 1 and 2.

    But i don't think it is an ideal solution to loop through 20 or so RegEx pattern and maybe 2.000 files. That could end up in 40.000 or even more requests.


  • Qt Champions 2017

    @qDebug

    Hello,
    Your original, but corrected regex (\d+)[xX](\d+)|[yY](\d+)[mM](\d+)|year-(\d+)_month-(\d+) checks out and captures everything (with global matching), I've run my simple test here. Unfortunately, it does that in groups from 1 to 6, so my guess is that it's not a Qt specific issue, but how the PCRE engine actually works.

    As a side note your regular expression uses about 2 times the steps mine does. I strongly suspect this is because of the branching induced by the alternatives specifier that you use,

    Kind regards.



  • You are right, i believe it is my lack of knowledge and not a Qt problem. I guess i can handle most or maybe all variants in one single RegEx. So far i only run into one problem if the year and month is one 4 digit number.

    QString string6 = "Find 0509 or not"; // Year 2005, September
    

    My RegEx will work with this example, because it picks the last of 4 digits, the 9. But if the month is 10, 11 or 12, it will results in 0, 1, and 2.

    QString string7 = "Find 0511 or not"; // Year 2005, November
    QString string8 = "Find 511 or not"; // Year 2005, November
    QString string9 = "Find 51 or not"; // Year 2005, January
    
    QRegularExpression re("((?:[yY]|[a^])?(\\d{1,2})(?:[xX]|[mM]|[\\d{4}])(\\d{1,2})";
    QRegularExpressionMatch match = re.match(string7);
    

    But that's also is a RegEx problem.

    Thanks.


  • Qt Champions 2017

    @qDebug
    Hello,
    My original suggestion could be modified to (mostly) suit that case as well, like this:

    \b(?:[yY]|year-)?(\d{1,2})(?:[xX]|[mM]|_month-)?(\d{1,2})\b
    

    However to have it work you have to switch the matching greediness. So you'd use it as follows:

    QRegularExpression rx("\\b(?:[yY]|year-)?(\\d{1,2})(?:[xX]|[mM]|_month-)?(\\d{1,2})\\b", QRegularExpression::InvertedGreedinessOption | QRegularExpression::OptimizeOnFirstUsageOption);
    // ... Use repeatedly as usual ...
    QRegularExpressionMatch match = rx.match(someString);
    

    You can check the results of the pattern here.

    Kind regards.


Log in to reply