How to make qt regex avoid matching already captured values?
-
I have a very big log file that contains some lines in this format:
2024-04-14 04:56:56.714-0300 24180 20660 UI at64_1 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 17532 1384 UI at64_2 [Ready] W: Retrying to obtain clipboard.
These lines always start in the format
2024-04-14 04:56:56.714-0300
, but with different timestamps.I know only the numbers after
-0300
:24180, 27268, 3696, 17532, etc
and I'm trying to get theat64_...
strings relative to the number line.From the example above i would like to get:
24180
,at64_1
27268
,64_4
3696
,64_3
17532
,64_2
I have written this regex:
(24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+
I have used
(?!.*\1)
to avoid capturing duplicated values, but using the same regex onQt
QRegularExpression
, it's capturing duplicated values:int main(int argc, char* argv[]) { QCoreApplication a(argc, argv); QString content = R"(2024-04-14 04:56:56.714-0300 24180 20660 UI at64_1 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 17532 1384 UI at64_2 [Ready] W: Retrying to obtain clipboard.)"; QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)"); QRegularExpressionMatchIterator match = re.globalMatch(content); if (!match.hasNext()) qDebug() << "failed"; while (match.hasNext()) { QRegularExpressionMatch nextMatch = match.next(); qDebug() << nextMatch.captured(1) << "|" << nextMatch.captured(0); } }
This code prints:
"24180" | "at64_1" "27268" | "at64_4" "3696" | "at64_3" "3696" | "at64_3" "27268" | "at64_4" "17532" | "at64_2"
It has captured duplicated values, different from the result seen here: https://regex101.com/r/IFJ1Oy/2
I'm using Qt 6, is it possible to make the
QRegularExpression
avoid capturing duplicates or will I necessarily need to parse it and remove the duplicates myself? -
I have a very big log file that contains some lines in this format:
2024-04-14 04:56:56.714-0300 24180 20660 UI at64_1 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 17532 1384 UI at64_2 [Ready] W: Retrying to obtain clipboard.
These lines always start in the format
2024-04-14 04:56:56.714-0300
, but with different timestamps.I know only the numbers after
-0300
:24180, 27268, 3696, 17532, etc
and I'm trying to get theat64_...
strings relative to the number line.From the example above i would like to get:
24180
,at64_1
27268
,64_4
3696
,64_3
17532
,64_2
I have written this regex:
(24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+
I have used
(?!.*\1)
to avoid capturing duplicated values, but using the same regex onQt
QRegularExpression
, it's capturing duplicated values:int main(int argc, char* argv[]) { QCoreApplication a(argc, argv); QString content = R"(2024-04-14 04:56:56.714-0300 24180 20660 UI at64_1 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 17532 1384 UI at64_2 [Ready] W: Retrying to obtain clipboard.)"; QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)"); QRegularExpressionMatchIterator match = re.globalMatch(content); if (!match.hasNext()) qDebug() << "failed"; while (match.hasNext()) { QRegularExpressionMatch nextMatch = match.next(); qDebug() << nextMatch.captured(1) << "|" << nextMatch.captured(0); } }
This code prints:
"24180" | "at64_1" "27268" | "at64_4" "3696" | "at64_3" "3696" | "at64_3" "27268" | "at64_4" "17532" | "at64_2"
It has captured duplicated values, different from the result seen here: https://regex101.com/r/IFJ1Oy/2
I'm using Qt 6, is it possible to make the
QRegularExpression
avoid capturing duplicates or will I necessarily need to parse it and remove the duplicates myself? -
I have a very big log file that contains some lines in this format:
2024-04-14 04:56:56.714-0300 24180 20660 UI at64_1 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 17532 1384 UI at64_2 [Ready] W: Retrying to obtain clipboard.
These lines always start in the format
2024-04-14 04:56:56.714-0300
, but with different timestamps.I know only the numbers after
-0300
:24180, 27268, 3696, 17532, etc
and I'm trying to get theat64_...
strings relative to the number line.From the example above i would like to get:
24180
,at64_1
27268
,64_4
3696
,64_3
17532
,64_2
I have written this regex:
(24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+
I have used
(?!.*\1)
to avoid capturing duplicated values, but using the same regex onQt
QRegularExpression
, it's capturing duplicated values:int main(int argc, char* argv[]) { QCoreApplication a(argc, argv); QString content = R"(2024-04-14 04:56:56.714-0300 24180 20660 UI at64_1 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.714-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 3696 20128 UI at64_3 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 27268 31992 UI at64_4 [Ready] W: Retrying to obtain clipboard. 2024-04-14 04:56:56.765-0300 17532 1384 UI at64_2 [Ready] W: Retrying to obtain clipboard.)"; QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)"); QRegularExpressionMatchIterator match = re.globalMatch(content); if (!match.hasNext()) qDebug() << "failed"; while (match.hasNext()) { QRegularExpressionMatch nextMatch = match.next(); qDebug() << nextMatch.captured(1) << "|" << nextMatch.captured(0); } }
This code prints:
"24180" | "at64_1" "27268" | "at64_4" "3696" | "at64_3" "3696" | "at64_3" "27268" | "at64_4" "17532" | "at64_2"
It has captured duplicated values, different from the result seen here: https://regex101.com/r/IFJ1Oy/2
I'm using Qt 6, is it possible to make the
QRegularExpression
avoid capturing duplicates or will I necessarily need to parse it and remove the duplicates myself?