Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Need help with QRegularExpression for strings and comments

Need help with QRegularExpression for strings and comments

Scheduled Pinned Locked Moved Solved General and Desktop
14 Posts 4 Posters 4.0k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • SikarjanS Offline
    SikarjanS Offline
    Sikarjan
    wrote on last edited by
    #3

    I already tried that. But looks like the syntax here is a bit different. Regex101 would suggest this string

    ((?<!\\)([\"']))|(\/\*)|(\*\/)
    

    But this one is not working either.

    mrjjM 1 Reply Last reply
    0
    • SikarjanS Sikarjan

      I already tried that. But looks like the syntax here is a bit different. Regex101 would suggest this string

      ((?<!\\)([\"']))|(\/\*)|(\*\/)
      

      But this one is not working either.

      mrjjM Offline
      mrjjM Offline
      mrjj
      Lifetime Qt Champion
      wrote on last edited by
      #4

      @Sikarjan
      and u do CORRECTLY escape it ?

      1 Reply Last reply
      1
      • SikarjanS Offline
        SikarjanS Offline
        Sikarjan
        wrote on last edited by
        #5

        I believe so.

        QRegExp("/\\*")
        

        Is a valid expression.

        kshegunovK 1 Reply Last reply
        0
        • SikarjanS Sikarjan

          I believe so.

          QRegExp("/\\*")
          

          Is a valid expression.

          kshegunovK Offline
          kshegunovK Offline
          kshegunov
          Moderators
          wrote on last edited by kshegunov
          #6

          Here's for comments QRegularExpression("\\/\\*+\\s*(?!<\\*)(.*?)(?:\\**)\\s*\\*\\/").
          Here's for strings: QRegularExpression("\"(.+?)(?:\"\\s*)??(?<!(?<!\\\\)\\\\)\"")

          However this is a really iffy use of regular expressions, they can't cover all the possible cases. For example the proposed string matching expression doesn't handle:

          "string"  "concatenated string"
          

          well, and also will fail to properly match strings containing \\\" . The "real" solution is to have a proper parser.

          Read and abide by the Qt Code of Conduct

          1 Reply Last reply
          1
          • SikarjanS Offline
            SikarjanS Offline
            Sikarjan
            wrote on last edited by Sikarjan
            #7

            @kshegunov

            Maybe I need give you some more background. I am working on code highlighter for PHP. I started with the highlighter example and redid the multi line section. In PHP a String could be over more lines.

            "I am some text
            in a multi line string";
            

            It could be in singe or double quotes. If the string is started with either one the other will not end the string. This is why your suggestions would not work in my case.

            "<a href='../test.php'>see the \"Test\" page</a>";
            

            This would be one string and should all be highlighted in green (in my case).

            Everything above I had working with the code below and this

            quoteExpressions = QRegularExpression(R"**(?<!\\)([\"']**");
            

            regex.

            But there is another case. Something like

            glob('images/*.jpg');
            

            is also possible. If I do the quotes and the comments in two sections, the code above will be interpreted als a beginning string and then be changed to a comment. Therefore I tried to combine all multi line cases in one "function", see below. I believe my code should work if I get the regex to work. Unfortunately I do not understand the regex with the R"**... . Probably there is a better way to do what I want but this is the best I could come up with.

               multiLineCommentFormat.setForeground(Qt::gray);
               multiLineQuoteFormat.setForeground(Qt::darkGreen);
               quoteExpressions = QRegularExpression(R"**(?<!\\)([\"']|(/\\*)|(\\*/))**");
            }
            
            void Highlighter::highlightBlock(const QString &text)
            {
                setCurrentBlockState(0);
            
                if (previousBlockState() <= 0){
                    QRegularExpressionMatchIterator quoteMatch = quoteExpressions.globalMatch(text);
            
                    while(quoteMatch.hasNext()){
                        QRegularExpressionMatch match = quoteMatch.next();
                        int quoteStart = match.capturedStart();
                        int quoteLength = 0;
                        bool foundNextQuote = false;
                        QString lastQuote = match.captured();
                        int blockState = 3;
                        if(lastQuote == "'"){
                            blockState = 2;
                        }else if(lastQuote == "/*"){
                            blockState = 1;
                            lastQuote = "*/";
                        }
            
                        while(quoteMatch.hasNext()){
                            match = quoteMatch.next();
                            if(match.captured() == lastQuote){
                                quoteLength = match.capturedStart() - quoteStart;
                                foundNextQuote = true;
                                break;
                            }
                        }
            
                        if(!foundNextQuote){
                            setCurrentBlockState(blockState);
                            quoteLength = text.length() - quoteStart;
                        }
                        setFormat(quoteStart, quoteLength+1, blockState == 1 ? multiLineCommentFormat:multiLineQuoteFormat);
                    }
                }else{
                    QRegularExpressionMatchIterator quoteMatch = quoteExpressions.globalMatch(text);
                    QString lastQuote = "\"";
                    if(previousBlockState() == 1)
                        lastQuote = "*/";
                    else if(previousBlockState() == 2)
                        lastQuote = "'";
            
                    bool foundNextQuote = false;
                    while(quoteMatch.hasNext()){
                        QRegularExpressionMatch match = quoteMatch.next();
                        if(match.captured() == lastQuote){
                            setFormat(0, match.capturedStart()+1, previousBlockState() == 1 ? multiLineCommentFormat:multiLineQuoteFormat);
                            foundNextQuote = true;
                            break;
                        }
                    }
            
                    if(!quoteMatch.hasNext() && !foundNextQuote){
                        setCurrentBlockState(previousBlockState());
                        setFormat(0, text.length(), previousBlockState() == 1 ? multiLineCommentFormat:multiLineQuoteFormat);
                    }
            
                    while(quoteMatch.hasNext()){
                        QRegularExpressionMatch match = quoteMatch.next();
                        int quoteStart = match.capturedStart();
                        int quoteLength = 0;
                        bool foundNextQuote = false;
                        QString lastQuote = match.captured();
                        int blockState = 3;
                        if(lastQuote == "'"){
                            blockState = 2;
                        }else if(lastQuote == "/*"){
                            blockState = 1;
                            lastQuote = "*/";
                        }
            
                        while(quoteMatch.hasNext()){
                            match = quoteMatch.next();
                            if(match.captured() == lastQuote){
                                quoteLength = match.capturedStart() - quoteStart;
                                foundNextQuote = true;
                                break;
                            }
                        }
            
                        if(!foundNextQuote){
                            setCurrentBlockState(blockState);
                            quoteLength = text.length() - quoteStart;
                        }
                        setFormat(quoteStart, quoteLength+1, blockState == 1 ? multiLineCommentFormat:multiLineQuoteFormat);
                    }
                }
            }
            
            kshegunovK 1 Reply Last reply
            0
            • SikarjanS Sikarjan

              @kshegunov

              Maybe I need give you some more background. I am working on code highlighter for PHP. I started with the highlighter example and redid the multi line section. In PHP a String could be over more lines.

              "I am some text
              in a multi line string";
              

              It could be in singe or double quotes. If the string is started with either one the other will not end the string. This is why your suggestions would not work in my case.

              "<a href='../test.php'>see the \"Test\" page</a>";
              

              This would be one string and should all be highlighted in green (in my case).

              Everything above I had working with the code below and this

              quoteExpressions = QRegularExpression(R"**(?<!\\)([\"']**");
              

              regex.

              But there is another case. Something like

              glob('images/*.jpg');
              

              is also possible. If I do the quotes and the comments in two sections, the code above will be interpreted als a beginning string and then be changed to a comment. Therefore I tried to combine all multi line cases in one "function", see below. I believe my code should work if I get the regex to work. Unfortunately I do not understand the regex with the R"**... . Probably there is a better way to do what I want but this is the best I could come up with.

                 multiLineCommentFormat.setForeground(Qt::gray);
                 multiLineQuoteFormat.setForeground(Qt::darkGreen);
                 quoteExpressions = QRegularExpression(R"**(?<!\\)([\"']|(/\\*)|(\\*/))**");
              }
              
              void Highlighter::highlightBlock(const QString &text)
              {
                  setCurrentBlockState(0);
              
                  if (previousBlockState() <= 0){
                      QRegularExpressionMatchIterator quoteMatch = quoteExpressions.globalMatch(text);
              
                      while(quoteMatch.hasNext()){
                          QRegularExpressionMatch match = quoteMatch.next();
                          int quoteStart = match.capturedStart();
                          int quoteLength = 0;
                          bool foundNextQuote = false;
                          QString lastQuote = match.captured();
                          int blockState = 3;
                          if(lastQuote == "'"){
                              blockState = 2;
                          }else if(lastQuote == "/*"){
                              blockState = 1;
                              lastQuote = "*/";
                          }
              
                          while(quoteMatch.hasNext()){
                              match = quoteMatch.next();
                              if(match.captured() == lastQuote){
                                  quoteLength = match.capturedStart() - quoteStart;
                                  foundNextQuote = true;
                                  break;
                              }
                          }
              
                          if(!foundNextQuote){
                              setCurrentBlockState(blockState);
                              quoteLength = text.length() - quoteStart;
                          }
                          setFormat(quoteStart, quoteLength+1, blockState == 1 ? multiLineCommentFormat:multiLineQuoteFormat);
                      }
                  }else{
                      QRegularExpressionMatchIterator quoteMatch = quoteExpressions.globalMatch(text);
                      QString lastQuote = "\"";
                      if(previousBlockState() == 1)
                          lastQuote = "*/";
                      else if(previousBlockState() == 2)
                          lastQuote = "'";
              
                      bool foundNextQuote = false;
                      while(quoteMatch.hasNext()){
                          QRegularExpressionMatch match = quoteMatch.next();
                          if(match.captured() == lastQuote){
                              setFormat(0, match.capturedStart()+1, previousBlockState() == 1 ? multiLineCommentFormat:multiLineQuoteFormat);
                              foundNextQuote = true;
                              break;
                          }
                      }
              
                      if(!quoteMatch.hasNext() && !foundNextQuote){
                          setCurrentBlockState(previousBlockState());
                          setFormat(0, text.length(), previousBlockState() == 1 ? multiLineCommentFormat:multiLineQuoteFormat);
                      }
              
                      while(quoteMatch.hasNext()){
                          QRegularExpressionMatch match = quoteMatch.next();
                          int quoteStart = match.capturedStart();
                          int quoteLength = 0;
                          bool foundNextQuote = false;
                          QString lastQuote = match.captured();
                          int blockState = 3;
                          if(lastQuote == "'"){
                              blockState = 2;
                          }else if(lastQuote == "/*"){
                              blockState = 1;
                              lastQuote = "*/";
                          }
              
                          while(quoteMatch.hasNext()){
                              match = quoteMatch.next();
                              if(match.captured() == lastQuote){
                                  quoteLength = match.capturedStart() - quoteStart;
                                  foundNextQuote = true;
                                  break;
                              }
                          }
              
                          if(!foundNextQuote){
                              setCurrentBlockState(blockState);
                              quoteLength = text.length() - quoteStart;
                          }
                          setFormat(quoteStart, quoteLength+1, blockState == 1 ? multiLineCommentFormat:multiLineQuoteFormat);
                      }
                  }
              }
              
              kshegunovK Offline
              kshegunovK Offline
              kshegunov
              Moderators
              wrote on last edited by
              #8

              @Sikarjan said in Need help with QRegularExpression for strings and comments:

              It could be in singe or double quotes.

              This is rather irrelevant, the regex can be trivially modified to allow for single quotes.

              I am working on code highlighter for PHP.

              Sorry to bring that to you, but then you're definitely on a slippery slope, you need a proper parser (rather a tokenizer), you won't be able to make it work reliably with regular expressions alone. It should be a simple matter as you can also directly use PHP's own language API to get the tokenization directly. If not an option, you can write your own it's not a very hard thing to do.

              Read and abide by the Qt Code of Conduct

              SikarjanS 1 Reply Last reply
              1
              • VRoninV Offline
                VRoninV Offline
                VRonin
                wrote on last edited by
                #9

                Steep learning curve but boost::spirit can be an option for a proper parser

                "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                ~Napoleon Bonaparte

                On a crusade to banish setIndexWidget() from the holy land of Qt

                1 Reply Last reply
                1
                • kshegunovK kshegunov

                  @Sikarjan said in Need help with QRegularExpression for strings and comments:

                  It could be in singe or double quotes.

                  This is rather irrelevant, the regex can be trivially modified to allow for single quotes.

                  I am working on code highlighter for PHP.

                  Sorry to bring that to you, but then you're definitely on a slippery slope, you need a proper parser (rather a tokenizer), you won't be able to make it work reliably with regular expressions alone. It should be a simple matter as you can also directly use PHP's own language API to get the tokenization directly. If not an option, you can write your own it's not a very hard thing to do.

                  SikarjanS Offline
                  SikarjanS Offline
                  Sikarjan
                  wrote on last edited by
                  #10

                  @kshegunov said in Need help with QRegularExpression for strings and comments:

                  It should be a simple matter as you can also directly use PHP's own language API to get the tokenization directly.

                  That sounds very simple indeed but I don't understand a word. Do you happen to have a link, which is a good entry point for that topic? I only have some PHP background and I am not a programmer by training. So my skills are very, very limited.

                  Thanks for the help, so!

                  kshegunovK 1 Reply Last reply
                  0
                  • SikarjanS Offline
                    SikarjanS Offline
                    Sikarjan
                    wrote on last edited by
                    #11

                    Hi,

                    I got my code working with the following regex

                    quoteExpressions = QRegularExpression("(?<!\\\\)([\"'])|(\\/\\*)|(\\*\\/)");
                    

                    Thanks @mrjj for making me recheck it again.

                    I am still interested in a parser solution as well but so far I was not able to find something that would help me understand the two post about it.

                    1 Reply Last reply
                    1
                    • SikarjanS Sikarjan

                      @kshegunov said in Need help with QRegularExpression for strings and comments:

                      It should be a simple matter as you can also directly use PHP's own language API to get the tokenization directly.

                      That sounds very simple indeed but I don't understand a word. Do you happen to have a link, which is a good entry point for that topic? I only have some PHP background and I am not a programmer by training. So my skills are very, very limited.

                      Thanks for the help, so!

                      kshegunovK Offline
                      kshegunovK Offline
                      kshegunov
                      Moderators
                      wrote on last edited by
                      #12

                      To tokenize something basically means to split into some kind of atomic units - e. g. string literals, identifiers, number literals, parenthesis and so on. Start with wikipedia. Also as I said, you have that already in PHP:
                      http://php.net/manual/en/function.token-get-all.php
                      http://php.net/manual/en/function.token-name.php

                      Read and abide by the Qt Code of Conduct

                      SikarjanS 1 Reply Last reply
                      0
                      • kshegunovK kshegunov

                        To tokenize something basically means to split into some kind of atomic units - e. g. string literals, identifiers, number literals, parenthesis and so on. Start with wikipedia. Also as I said, you have that already in PHP:
                        http://php.net/manual/en/function.token-get-all.php
                        http://php.net/manual/en/function.token-name.php

                        SikarjanS Offline
                        SikarjanS Offline
                        Sikarjan
                        wrote on last edited by
                        #13

                        @kshegunov I believe I get an idea. What I am unsure about is how the parser would work. Like how would I call it? Would it rescan the entire file with every key stroke?
                        The problem with a PHP file is that it could contain html, css and javascript parts, which should have their own highlighting and auto completion.

                        kshegunovK 1 Reply Last reply
                        0
                        • SikarjanS Sikarjan

                          @kshegunov I believe I get an idea. What I am unsure about is how the parser would work. Like how would I call it? Would it rescan the entire file with every key stroke?
                          The problem with a PHP file is that it could contain html, css and javascript parts, which should have their own highlighting and auto completion.

                          kshegunovK Offline
                          kshegunovK Offline
                          kshegunov
                          Moderators
                          wrote on last edited by
                          #14

                          That's no problem of PHP (from it's point of view). If you look at the list of tokens you see that it doesn't care about any HTML, javascript or css. It just reads the stuff outside <?php and ?> and prints it to the standard stream (the T_INLINE_HTML token), it cares not what it contains. So for highlighting any one of those languages you will need another tokenizer that recognizes them.

                          Read and abide by the Qt Code of Conduct

                          1 Reply Last reply
                          0

                          • Login

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • Users
                          • Groups
                          • Search
                          • Get Qt Extensions
                          • Unsolved