Solved QRegularExpression unreliable results with RegEx piping (Qt 5.5.1 Windows)
-
I try to use QRegularExpression to get reliable results to find one or two digits year and month in a string. The format inside the string is not always the same, i need to cover a few variants so i gave piping a chance. So far without much luck. For example:
QString string1 = "Find my Date 15x01"; // Year 2015, January QString string2 = "Find Y15M01 Date"; // Year 2015, January QString string3 = "Find y03m11 date"; // Year 2003, November QString string4 = "Find Date 1x9"; // Year 2001, September QString string5 = "Find year-16_month-09"; // Year 2016, September
And now the RegEx part:
QRegularExpression re("(\\d+)[xX](\\d+)"); QRegularExpressionMatch match = re.match(string1);
Works as expected. The first RegEx always works! But if i try to pipe and not the first RegEx is a match, the results are there but not in groups as expected i can work with.
QRegularExpression re("(\\d+)[xX](\\d+)|[yY](\\d+)[mM](\\d+)"); QRegularExpressionMatch match = re.match(string2);
or
QRegularExpression re("(\\d+)[xX](\\d+)|[yY](\\d+)[mM](\\d+)|[year-](\\d+)[_month-](\\d+)"); QRegularExpressionMatch match = re.match(string5);
Is it me or my or my RegEx? I did test the same on Python and it did work as expected. So far no luck in Qt. If i use just one RegEx and or the first RegEx ist a match, it works perfectly. But if the RegEx is 2nd or 3rd or , ... in the pipe, it is still a match but no groups i can work with. I always expect 1 to be the year and 2 for the month.
r_year = match.captured(1); r_month = match.captured(2);
Thanks!
-
@qDebug
Hello,
Well, I can't say for sure why that might be, however you could use another regex that will have only two groups (without the alternatives), similar to this:\b(?:[yY]|year-)?(\d+)(?:[xX]|[mM]|_month-)(\d+)\b
Still, you do seem to have an error in your pattern, namely:
[year-]
and[_month-]
.Kind regards.
-
Using just one RegEx is not an option in my project, because i want to integrate the ability to add and remove RegEx patterns. The only way i can think of right now is a QStringList and foreach through all the patterns i single stored and then if/else check the captured resutls 1 and 2.
But i don't think it is an ideal solution to loop through 20 or so RegEx pattern and maybe 2.000 files. That could end up in 40.000 or even more requests.
-
Hello,
Your original, but corrected regex(\d+)[xX](\d+)|[yY](\d+)[mM](\d+)|year-(\d+)_month-(\d+)
checks out and captures everything (with global matching), I've run my simple test here. Unfortunately, it does that in groups from 1 to 6, so my guess is that it's not a Qt specific issue, but how the PCRE engine actually works.As a side note your regular expression uses about 2 times the steps mine does. I strongly suspect this is because of the branching induced by the alternatives specifier that you use,
Kind regards.
-
You are right, i believe it is my lack of knowledge and not a Qt problem. I guess i can handle most or maybe all variants in one single RegEx. So far i only run into one problem if the year and month is one 4 digit number.
QString string6 = "Find 0509 or not"; // Year 2005, September
My RegEx will work with this example, because it picks the last of 4 digits, the 9. But if the month is 10, 11 or 12, it will results in 0, 1, and 2.
QString string7 = "Find 0511 or not"; // Year 2005, November QString string8 = "Find 511 or not"; // Year 2005, November QString string9 = "Find 51 or not"; // Year 2005, January QRegularExpression re("((?:[yY]|[a^])?(\\d{1,2})(?:[xX]|[mM]|[\\d{4}])(\\d{1,2})"; QRegularExpressionMatch match = re.match(string7);
But that's also is a RegEx problem.
Thanks.
-
@qDebug
Hello,
My original suggestion could be modified to (mostly) suit that case as well, like this:\b(?:[yY]|year-)?(\d{1,2})(?:[xX]|[mM]|_month-)?(\d{1,2})\b
However to have it work you have to switch the matching greediness. So you'd use it as follows:
QRegularExpression rx("\\b(?:[yY]|year-)?(\\d{1,2})(?:[xX]|[mM]|_month-)?(\\d{1,2})\\b", QRegularExpression::InvertedGreedinessOption | QRegularExpression::OptimizeOnFirstUsageOption); // ... Use repeatedly as usual ... QRegularExpressionMatch match = rx.match(someString);
You can check the results of the pattern here.
Kind regards.