Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. QRegExp - does not return all printable characters.
Forum Updated to NodeBB v4.3 + New Features

QRegExp - does not return all printable characters.

Scheduled Pinned Locked Moved Solved General and Desktop
15 Posts 4 Posters 848 Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    Anonymous_Banned275
    wrote on last edited by
    #1

    QRegExp rx("^\p[ -~]*$");
    The above code does not work - in C++. .
    Not sure about the syntax.

    This code works fine
    QRegExp rx("(\d+)(\s*)(cm|inch(es)?)");

    I also cannot find meaning of "\p"in QT doc.
    .

    1 Reply Last reply
    0
    • SGaistS Offline
      SGaistS Offline
      SGaist
      Lifetime Qt Champion
      wrote on last edited by
      #2

      Hi,

      QRegExp has been deprecated in Qt 5 and removed in Qt 6.

      Please use QRegularExpression.

      You can use the QRegularExpression Example to design and validate your regex.

      Interested in AI ? www.idiap.ch
      Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

      A 1 Reply Last reply
      2
      • SGaistS SGaist

        Hi,

        QRegExp has been deprecated in Qt 5 and removed in Qt 6.

        Please use QRegularExpression.

        You can use the QRegularExpression Example to design and validate your regex.

        A Offline
        A Offline
        Anonymous_Banned275
        wrote on last edited by
        #3

        @SGaist Success !

        This simple code "matches" first word (#Waiting.. ) in 'line" string , but LAST character.
        Time to read more and add stuff...
        Thanks

         QRegularExpression re("[ -~] ");
                    QRegularExpressionMatch match = re.match(line);
                    qDebug() <<"Regulsr expression ...test match " ;
                    if (match.hasMatch()) {
                        qDebug() <<"Regulsr expression ...match.hasMatch()" ;
                        QString matched = match.captured(0); 
                        qDebug()<< "Matched " << matched; 
                        // ...
                    }
                    else
                    {
                        qDebug()<< "NO MATCH  ";
                    } ...
        
        JonBJ 1 Reply Last reply
        0
        • A Anonymous_Banned275

          @SGaist Success !

          This simple code "matches" first word (#Waiting.. ) in 'line" string , but LAST character.
          Time to read more and add stuff...
          Thanks

           QRegularExpression re("[ -~] ");
                      QRegularExpressionMatch match = re.match(line);
                      qDebug() <<"Regulsr expression ...test match " ;
                      if (match.hasMatch()) {
                          qDebug() <<"Regulsr expression ...match.hasMatch()" ;
                          QString matched = match.captured(0); 
                          qDebug()<< "Matched " << matched; 
                          // ...
                      }
                      else
                      {
                          qDebug()<< "NO MATCH  ";
                      } ...
          
          JonBJ Offline
          JonBJ Offline
          JonB
          wrote on last edited by JonB
          #4

          @AnneRanch said in QRegExp - does not return all printable characters.:

          QRegularExpression re("[ -~] ");

          This does not match a word. It matches a space/hyphen/tilde followed by a space. [EDIT No, see next answer below.] So match.captured(0) will return precisely 2 characters. [EDIT This is still true.]

          QRegExp rx("^\p[ -~]*$");
          QRegExp rx("(\d+)(\s*)(cm|inch(es)?)");

          Both of these --- and it would be same for QRegularExpression --- would require the \s in the literal strings to be written as \\ for each one, as per C++ rules for literal string characters.

          A 1 Reply Last reply
          1
          • JonBJ JonB

            @AnneRanch said in QRegExp - does not return all printable characters.:

            QRegularExpression re("[ -~] ");

            This does not match a word. It matches a space/hyphen/tilde followed by a space. [EDIT No, see next answer below.] So match.captured(0) will return precisely 2 characters. [EDIT This is still true.]

            QRegExp rx("^\p[ -~]*$");
            QRegExp rx("(\d+)(\s*)(cm|inch(es)?)");

            Both of these --- and it would be same for QRegularExpression --- would require the \s in the literal strings to be written as \\ for each one, as per C++ rules for literal string characters.

            A Offline
            A Offline
            Anonymous_Banned275
            wrote on last edited by Anonymous_Banned275
            #5

            @JonB I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
            Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".

            JonBJ KroMignonK 2 Replies Last reply
            0
            • A Anonymous_Banned275

              @JonB I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
              Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".

              JonBJ Offline
              JonBJ Offline
              JonB
              wrote on last edited by JonB
              #6

              @AnneRanch said in QRegExp - does not return all printable characters.:

              I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ".

              And I cannot argue with this! Let me be the first to admit when I have made mistake :) I read too quickly, and quite forgot that - (hyphen) is the one character with "special significance" when appearing inside the [...] one-of-characters entity. There (and only there) it represents a "character range", from the character before it (space here) to the character after it (tilde here), which does indeed match all printable (ASCII) characters.

              So re("[ -~] ") does indeed match "any printable character followed by a space". That still does mean precisely 2 characters rather than a whole word, which would have to be the last character of a "word" followed by a space. However, it will also match 2 spaces, which is presumably not desired.

              There are several ways to pick out a "word" if that is what you want to do instead. The simplest would probably be (in your C++ code):

              QRegularExpression re("\\w+");
              

              but it all depends "which" word you might want in a line, what to do about the other characters in the line, etc.

              1 Reply Last reply
              1
              • A Anonymous_Banned275

                @JonB I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
                Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".

                KroMignonK Offline
                KroMignonK Offline
                KroMignon
                wrote on last edited by
                #7

                @AnneRanch said in QRegExp - does not return all printable characters.:

                I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
                Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".

                With QRegExp, the dash symbol in [] is used to define a range of characters (cf. Qt documentation https://doc.qt.io/qt-5/qregexp.html#sets-of-characters).

                I think the right way would be to use backslash as escape sequence:

                RegularExpression re("[ \\-~] ");
                

                It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth. (Sherlock Holmes)

                JonBJ 1 Reply Last reply
                0
                • KroMignonK KroMignon

                  @AnneRanch said in QRegExp - does not return all printable characters.:

                  I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
                  Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".

                  With QRegExp, the dash symbol in [] is used to define a range of characters (cf. Qt documentation https://doc.qt.io/qt-5/qregexp.html#sets-of-characters).

                  I think the right way would be to use backslash as escape sequence:

                  RegularExpression re("[ \\-~] ");
                  
                  JonBJ Offline
                  JonBJ Offline
                  JonB
                  wrote on last edited by JonB
                  #8
                  This post is deleted!
                  KroMignonK 1 Reply Last reply
                  0
                  • JonBJ JonB

                    This post is deleted!

                    KroMignonK Offline
                    KroMignonK Offline
                    KroMignon
                    wrote on last edited by
                    #9

                    @JonB said in QRegExp - does not return all printable characters.:

                    I don't think that's right: I believe that would match a space or any character between \ and ~!

                    I have to admit that I don't have understand what is the goal of this regex.
                    My comprehension was that he want to match only on 3 characters space, bash and tilde.
                    But maybe I am wrong and the goal was to get any character between space and tilde (booth included) followed by a space.
                    So the regex should be RegularExpression re("[ -~] ");
                    But this also don't made sense to me.

                    It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth. (Sherlock Holmes)

                    JonBJ 1 Reply Last reply
                    0
                    • KroMignonK KroMignon

                      @JonB said in QRegExp - does not return all printable characters.:

                      I don't think that's right: I believe that would match a space or any character between \ and ~!

                      I have to admit that I don't have understand what is the goal of this regex.
                      My comprehension was that he want to match only on 3 characters space, bash and tilde.
                      But maybe I am wrong and the goal was to get any character between space and tilde (booth included) followed by a space.
                      So the regex should be RegularExpression re("[ -~] ");
                      But this also don't made sense to me.

                      JonBJ Offline
                      JonBJ Offline
                      JonB
                      wrote on last edited by JonB
                      #10

                      @KroMignon
                      Hi Kro. I had to delete my response you have quoted after I had written it! In my day, it used to be the case that to place a literal - inside a [...] character range you had to put it as the first or last character, so that it could not be interpreted as a "range" operator, which is what I was writing up. However, it seems that since those dim, dark UNIX System V.0 days you can now escape a literal - via \- so you can have it anywhere in the characters, just as you wrote. Hence I deleted my response to your response!

                      But maybe I am wrong and the goal was to get any character between space and tilde (booth included) followed by a space.

                      That is exactly what the OP must have got from the source she took it from. However as pointed out that would include space-followed-by-space, which is unlikely to be desirable/robust. And at best it would have picked out the last character of a word followed by a space, which still seems an "unusual" thing to want. I have since suggested \w+ as probably the most canonical way of picking out "a word".

                      1 Reply Last reply
                      0
                      • A Offline
                        A Offline
                        Anonymous_Banned275
                        wrote on last edited by
                        #11

                        I hope this may help this discussion.
                        I am positing the "raw" file - all three lines.
                        I have added a silly loop to see if I can mach more than FIRST word / character in each line .

                        Stream file 
                        Stream file  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                        Regulsr expression ...test match 
                        Regulsr expression ...match.hasMatch()
                        Matched  "Waiting"
                        Matched  ""
                        Matched  ""
                        Matched  ""
                        Matched  ""
                        linea:  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                        Stream file 
                        Stream file  "\u001B[0;94m[bluetooth]\u001B[0m# "
                        Regulsr expression ...test match 
                        Regulsr expression ...match.hasMatch()
                        Matched  "0"
                        Matched  ""
                        Matched  ""
                        Matched  ""
                        Matched  ""
                        linea:  "\u001B[0;94m[bluetooth]\u001B[0m# "
                        Overload QString
                         File bt_utility_library.cpp 
                        function  ProcessCommand 
                        @ line   162 
                        
                        

                        Again this is code under construction and posted in hope to help.

                        I have used
                        QRegularExpression re("\w+"); //matches first word / character

                        Also to clarify
                        at this point of learning using QRegulaExpression I just what to extract ALL words from the FIRST line.

                        I'll tackle deleting the control characters AKA match only printable characters later.

                        JonBJ 1 Reply Last reply
                        0
                        • A Offline
                          A Offline
                          Anonymous_Banned275
                          wrote on last edited by
                          #12

                          Guess what I found - nice interactive tool

                          https://regex101.com/

                          1 Reply Last reply
                          0
                          • A Anonymous_Banned275

                            I hope this may help this discussion.
                            I am positing the "raw" file - all three lines.
                            I have added a silly loop to see if I can mach more than FIRST word / character in each line .

                            Stream file 
                            Stream file  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                            Regulsr expression ...test match 
                            Regulsr expression ...match.hasMatch()
                            Matched  "Waiting"
                            Matched  ""
                            Matched  ""
                            Matched  ""
                            Matched  ""
                            linea:  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                            Stream file 
                            Stream file  "\u001B[0;94m[bluetooth]\u001B[0m# "
                            Regulsr expression ...test match 
                            Regulsr expression ...match.hasMatch()
                            Matched  "0"
                            Matched  ""
                            Matched  ""
                            Matched  ""
                            Matched  ""
                            linea:  "\u001B[0;94m[bluetooth]\u001B[0m# "
                            Overload QString
                             File bt_utility_library.cpp 
                            function  ProcessCommand 
                            @ line   162 
                            
                            

                            Again this is code under construction and posted in hope to help.

                            I have used
                            QRegularExpression re("\w+"); //matches first word / character

                            Also to clarify
                            at this point of learning using QRegulaExpression I just what to extract ALL words from the FIRST line.

                            I'll tackle deleting the control characters AKA match only printable characters later.

                            JonBJ Offline
                            JonBJ Offline
                            JonB
                            wrote on last edited by JonB
                            #13

                            @AnneRanch said in QRegExp - does not return all printable characters.:

                            QRegularExpression re("\w+"); //matches first word / character

                            Exactly as I wrote in response to your code earlier, if this is what you have written in your C++ source code it will not work.

                            I explained about the \ in C/C++ source code and said you would need to have:

                            QRegularExpression re("\\w+");
                            

                            Is that what you have?

                            at this point of learning using QRegulaExpression I just what to extract ALL words from the FIRST line.

                            If you want to match/extract multiple matches you must change to calling QRegularExpression::globalmatch(). This is described in https://doc.qt.io/qt-5/qregularexpression.html#global-matching, and the example there is exactly what you are asking for:

                            Global matching is useful to find all the occurrences of a given regular expression inside a subject string. Suppose that we want to extract all the words from a given string, where a word is a substring matching the pattern \w+.

                            QRegularExpression re("(\\w+)");
                            QRegularExpressionMatchIterator i = re.globalMatch("the quick fox");
                            
                            QStringList words;
                            while (i.hasNext()) {
                                QRegularExpressionMatch match = i.next();
                                QString word = match.captured(1);
                                words << word;
                            }
                            // words contains "the", "quick", "fox"
                            
                            1 Reply Last reply
                            2
                            • A Offline
                              A Offline
                              Anonymous_Banned275
                              wrote on last edited by
                              #14

                              Update / conclusion (??)
                              The attached TEST code works as expected using the ORIGINAL "range" pattern. Using "global match " solved it. It matches EACH individual character in the string - not the entire word.
                              Note there are NO "escape characters" in the pattern.
                              It also "prints" control characters - which is NOT desirable - so I will work on that.
                              It is probably slow, but it does not matter in my app.

                              #ifdef BYPASS
                                            terminal outoput 
                                              bluetoothctl
                                              Agent registered
                                              [bluetooth]#
                              
                                             string to analyze / match 
                                              linea:  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                                              linea:  "\u001B[0;94m[bluetooth]\u001B[0m# "
                              #endif
                              
                               //               QRegularExpression re("(\\w+)");  works OK 
                                              QRegularExpression re("([ -z])");       original range 
                                              QString pattern = re.pattern(); // pattern == "a third pattern"
                                              qDebug()<<"Pattern function " << pattern;
                                              TRACE_TextEdit->addItem("TRACE pattern ");
                                              TRACE_TextEdit->addItem(pattern);
                                              QRegularExpressionMatchIterator i = re.globalMatch(line);
                                              QString text = "Match ";
                                              QStringList words;
                                              while (i.hasNext()) {
                                                  QRegularExpressionMatch match = i.next();
                                                  QString word = match.captured(1);
                                                  text += word;
                                                  TRACE_TextEdit->addItem(text );
                                                  words << word;
                                                  qDebug()<<" Global match (words)  " << words;
                                              }
                              
                              JonBJ 1 Reply Last reply
                              0
                              • A Anonymous_Banned275

                                Update / conclusion (??)
                                The attached TEST code works as expected using the ORIGINAL "range" pattern. Using "global match " solved it. It matches EACH individual character in the string - not the entire word.
                                Note there are NO "escape characters" in the pattern.
                                It also "prints" control characters - which is NOT desirable - so I will work on that.
                                It is probably slow, but it does not matter in my app.

                                #ifdef BYPASS
                                              terminal outoput 
                                                bluetoothctl
                                                Agent registered
                                                [bluetooth]#
                                
                                               string to analyze / match 
                                                linea:  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                                                linea:  "\u001B[0;94m[bluetooth]\u001B[0m# "
                                #endif
                                
                                 //               QRegularExpression re("(\\w+)");  works OK 
                                                QRegularExpression re("([ -z])");       original range 
                                                QString pattern = re.pattern(); // pattern == "a third pattern"
                                                qDebug()<<"Pattern function " << pattern;
                                                TRACE_TextEdit->addItem("TRACE pattern ");
                                                TRACE_TextEdit->addItem(pattern);
                                                QRegularExpressionMatchIterator i = re.globalMatch(line);
                                                QString text = "Match ";
                                                QStringList words;
                                                while (i.hasNext()) {
                                                    QRegularExpressionMatch match = i.next();
                                                    QString word = match.captured(1);
                                                    text += word;
                                                    TRACE_TextEdit->addItem(text );
                                                    words << word;
                                                    qDebug()<<" Global match (words)  " << words;
                                                }
                                
                                JonBJ Offline
                                JonBJ Offline
                                JonB
                                wrote on last edited by JonB
                                #15

                                @AnneRanch said in QRegExp - does not return all printable characters.:

                                It matches EACH individual character in the string - not the entire word.

                                That is because you have chosen to use re("([ -z])") instead of the re("(\\w+)") suggested. It is the + symbol which causes match whole word/multiple characters instead of per character. Did you really want to match each individual character separately? Up to you.

                                1 Reply Last reply
                                0

                                • Login

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • Users
                                • Groups
                                • Search
                                • Get Qt Extensions
                                • Unsolved