Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to make qt regex avoid matching already captured values?
Forum Updated to NodeBB v4.3 + New Features

How to make qt regex avoid matching already captured values?

Scheduled Pinned Locked Moved Unsolved General and Desktop
3 Posts 3 Posters 249 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K Offline
    K Offline
    Kattia
    wrote on last edited by Kattia
    #1

    I have a very big log file that contains some lines in this format:

    2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.
    

    These lines always start in the format 2024-04-14 04:56:56.714-0300, but with different timestamps.

    I know only the numbers after -0300 : 24180, 27268, 3696, 17532, etc
    and I'm trying to get the at64_... strings relative to the number line.

    From the example above i would like to get:

    24180, at64_1

    27268, 64_4

    3696, 64_3

    17532, 64_2

    I have written this regex: (24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+

    I have used (?!.*\1) to avoid capturing duplicated values, but using the same regex on Qt QRegularExpression, it's capturing duplicated values:

    int main(int argc, char* argv[])
    {
        QCoreApplication a(argc, argv);
    
        QString content = R"(2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
    2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.)";
    
        QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)");
        QRegularExpressionMatchIterator match = re.globalMatch(content);
        if (!match.hasNext())
            qDebug() << "failed";
        
        while (match.hasNext())
        {
        	QRegularExpressionMatch nextMatch = match.next();
        	qDebug() << nextMatch.captured(1) << "|" << nextMatch.captured(0);
        }
    }
    

    This code prints:

    "24180" | "at64_1"
    "27268" | "at64_4"
    "3696"  | "at64_3"
    "3696"  | "at64_3"
    "27268" | "at64_4"
    "17532" | "at64_2"
    

    It has captured duplicated values, different from the result seen here: https://regex101.com/r/IFJ1Oy/2

    I'm using Qt 6, is it possible to make the QRegularExpression avoid capturing duplicates or will I necessarily need to parse it and remove the duplicates myself?

    Christian EhrlicherC TassosT 2 Replies Last reply
    0
    • K Kattia

      I have a very big log file that contains some lines in this format:

      2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.
      

      These lines always start in the format 2024-04-14 04:56:56.714-0300, but with different timestamps.

      I know only the numbers after -0300 : 24180, 27268, 3696, 17532, etc
      and I'm trying to get the at64_... strings relative to the number line.

      From the example above i would like to get:

      24180, at64_1

      27268, 64_4

      3696, 64_3

      17532, 64_2

      I have written this regex: (24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+

      I have used (?!.*\1) to avoid capturing duplicated values, but using the same regex on Qt QRegularExpression, it's capturing duplicated values:

      int main(int argc, char* argv[])
      {
          QCoreApplication a(argc, argv);
      
          QString content = R"(2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
      2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.)";
      
          QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)");
          QRegularExpressionMatchIterator match = re.globalMatch(content);
          if (!match.hasNext())
              qDebug() << "failed";
          
          while (match.hasNext())
          {
          	QRegularExpressionMatch nextMatch = match.next();
          	qDebug() << nextMatch.captured(1) << "|" << nextMatch.captured(0);
          }
      }
      

      This code prints:

      "24180" | "at64_1"
      "27268" | "at64_4"
      "3696"  | "at64_3"
      "3696"  | "at64_3"
      "27268" | "at64_4"
      "17532" | "at64_2"
      

      It has captured duplicated values, different from the result seen here: https://regex101.com/r/IFJ1Oy/2

      I'm using Qt 6, is it possible to make the QRegularExpression avoid capturing duplicates or will I necessarily need to parse it and remove the duplicates myself?

      Christian EhrlicherC Offline
      Christian EhrlicherC Offline
      Christian Ehrlicher
      Lifetime Qt Champion
      wrote on last edited by
      #2

      @Kattia You have to do it by yourself.

      btw: Why using regex here at all? Simply split the string by and use index-based lookup - much better to read and understand what's going on.

      Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
      Visit the Qt Academy at https://academy.qt.io/catalog

      1 Reply Last reply
      2
      • K Kattia

        I have a very big log file that contains some lines in this format:

        2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.
        

        These lines always start in the format 2024-04-14 04:56:56.714-0300, but with different timestamps.

        I know only the numbers after -0300 : 24180, 27268, 3696, 17532, etc
        and I'm trying to get the at64_... strings relative to the number line.

        From the example above i would like to get:

        24180, at64_1

        27268, 64_4

        3696, 64_3

        17532, 64_2

        I have written this regex: (24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+

        I have used (?!.*\1) to avoid capturing duplicated values, but using the same regex on Qt QRegularExpression, it's capturing duplicated values:

        int main(int argc, char* argv[])
        {
            QCoreApplication a(argc, argv);
        
            QString content = R"(2024-04-14 04:56:56.714-0300 24180 20660 UI  at64_1 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.714-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.714-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 3696  20128 UI  at64_3 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 27268 31992 UI  at64_4 [Ready] W: Retrying to obtain clipboard.
        2024-04-14 04:56:56.765-0300 17532 1384  UI  at64_2 [Ready] W: Retrying to obtain clipboard.)";
        
            QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)");
            QRegularExpressionMatchIterator match = re.globalMatch(content);
            if (!match.hasNext())
                qDebug() << "failed";
            
            while (match.hasNext())
            {
            	QRegularExpressionMatch nextMatch = match.next();
            	qDebug() << nextMatch.captured(1) << "|" << nextMatch.captured(0);
            }
        }
        

        This code prints:

        "24180" | "at64_1"
        "27268" | "at64_4"
        "3696"  | "at64_3"
        "3696"  | "at64_3"
        "27268" | "at64_4"
        "17532" | "at64_2"
        

        It has captured duplicated values, different from the result seen here: https://regex101.com/r/IFJ1Oy/2

        I'm using Qt 6, is it possible to make the QRegularExpression avoid capturing duplicates or will I necessarily need to parse it and remove the duplicates myself?

        TassosT Offline
        TassosT Offline
        Tassos
        wrote on last edited by
        #3

        @Kattia I think passing these options to re will give you the desired results :)

          QRegularExpression re(R"((24180|27268|3696|17532)(?!.*\1)\s+\d+\s+\w+\s+\K\w+)", QRegularExpression::DotMatchesEverythingOption|QRegularExpression::MultilineOption);
        
        1 Reply Last reply
        2

        • Login

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • Users
        • Groups
        • Search
        • Get Qt Extensions
        • Unsolved