Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to make qt regex avoid matching already captured values?

How to make qt regex avoid matching already captured values?

Scheduled Pinned Locked Moved Unsolved General and Desktop
3 Posts 3 Posters 340 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K Offline
    K Offline
    Kattia
    wrote on last edited by Kattia
    #1

    I have a very big log file that contains some lines in this format:

    2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.
    

    These lines always start in the format 2024-04-14 04:56:56.714-0300, but with different timestamps.

    I know only the numbers after -0300 : 24180, 27268, 3696, 17532, etc
    and I'm trying to get the at64_... strings relative to the number line.

    From the example above i would like to get:

    24180, at64_1

    27268, 64_4

    3696, 64_3

    17532, 64_2

    I have written this regex: (24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+

    I have used (?!.*\1) to avoid capturing duplicated values, but using the same regex on Qt QRegularExpression, it's capturing duplicated values:

    int main(int argc, char* argv[])
    {
        QCoreApplication a(argc, argv);
    
        QString content = R"(2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.)";
    
        QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)");
        QRegularExpressionMatchIterator match = re.globalMatch(content);
        if (!match.hasNext())
            qDebug() << "failed";
        
        while (match.hasNext())
        {
        	QRegularExpressionMatch nextMatch = match.next();
        	qDebug() << nextMatch.captured(1) << "|" << nextMatch.captured(0);
        }
    }
    

    This code prints:

    "24180" | "at64_1"
    "27268" | "at64_4"
    "3696"  | "at64_3"
    "3696"  | "at64_3"
    "27268" | "at64_4"
    "17532" | "at64_2"
    

    It has captured duplicated values, different from the result seen here: https://regex101.com/r/IFJ1Oy/2

    I'm using Qt 6, is it possible to make the QRegularExpression avoid capturing duplicates or will I necessarily need to parse it and remove the duplicates myself?

    Christian EhrlicherC TassosT 2 Replies Last reply
    0
    • K Kattia

      I have a very big log file that contains some lines in this format:

      2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.
      

      These lines always start in the format 2024-04-14 04:56:56.714-0300, but with different timestamps.

      I know only the numbers after -0300 : 24180, 27268, 3696, 17532, etc
      and I'm trying to get the at64_... strings relative to the number line.

      From the example above i would like to get:

      24180, at64_1

      27268, 64_4

      3696, 64_3

      17532, 64_2

      I have written this regex: (24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+

      I have used (?!.*\1) to avoid capturing duplicated values, but using the same regex on Qt QRegularExpression, it's capturing duplicated values:

      int main(int argc, char* argv[])
      {
          QCoreApplication a(argc, argv);
      
          QString content = R"(2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.)";
      
          QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)");
          QRegularExpressionMatchIterator match = re.globalMatch(content);
          if (!match.hasNext())
              qDebug() << "failed";
          
          while (match.hasNext())
          {
          	QRegularExpressionMatch nextMatch = match.next();
          	qDebug() << nextMatch.captured(1) << "|" << nextMatch.captured(0);
          }
      }
      

      This code prints:

      "24180" | "at64_1"
      "27268" | "at64_4"
      "3696"  | "at64_3"
      "3696"  | "at64_3"
      "27268" | "at64_4"
      "17532" | "at64_2"
      

      It has captured duplicated values, different from the result seen here: https://regex101.com/r/IFJ1Oy/2

      I'm using Qt 6, is it possible to make the QRegularExpression avoid capturing duplicates or will I necessarily need to parse it and remove the duplicates myself?

      Christian EhrlicherC Offline
      Christian EhrlicherC Offline
      Christian Ehrlicher
      Lifetime Qt Champion
      wrote on last edited by
      #2

      @Kattia You have to do it by yourself.

      btw: Why using regex here at all? Simply split the string by and use index-based lookup - much better to read and understand what's going on.

      Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
      Visit the Qt Academy at https://academy.qt.io/catalog

      1 Reply Last reply
      2
      • K Kattia

        I have a very big log file that contains some lines in this format:

        2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.
        

        These lines always start in the format 2024-04-14 04:56:56.714-0300, but with different timestamps.

        I know only the numbers after -0300 : 24180, 27268, 3696, 17532, etc
        and I'm trying to get the at64_... strings relative to the number line.

        From the example above i would like to get:

        24180, at64_1

        27268, 64_4

        3696, 64_3

        17532, 64_2

        I have written this regex: (24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+

        I have used (?!.*\1) to avoid capturing duplicated values, but using the same regex on Qt QRegularExpression, it's capturing duplicated values:

        int main(int argc, char* argv[])
        {
            QCoreApplication a(argc, argv);
        
            QString content = R"(2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.)";
        
            QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)");
            QRegularExpressionMatchIterator match = re.globalMatch(content);
            if (!match.hasNext())
                qDebug() << "failed";
            
            while (match.hasNext())
            {
            	QRegularExpressionMatch nextMatch = match.next();
            	qDebug() << nextMatch.captured(1) << "|" << nextMatch.captured(0);
            }
        }
        

        This code prints:

        "24180" | "at64_1"
        "27268" | "at64_4"
        "3696"  | "at64_3"
        "3696"  | "at64_3"
        "27268" | "at64_4"
        "17532" | "at64_2"
        

        It has captured duplicated values, different from the result seen here: https://regex101.com/r/IFJ1Oy/2

        I'm using Qt 6, is it possible to make the QRegularExpression avoid capturing duplicates or will I necessarily need to parse it and remove the duplicates myself?

        TassosT Offline
        TassosT Offline
        Tassos
        wrote on last edited by
        #3

        @Kattia I think passing these options to re will give you the desired results :)

          QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)", QRegularExpression::DotMatchesEverythingOption|QRegularExpression::MultilineOption);
        
        1 Reply Last reply
        2

        • Login

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • Users
        • Groups
        • Search
        • Get Qt Extensions
        • Unsolved