Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. QRegExp - does not return all printable characters.
Forum Updated to NodeBB v4.3 + New Features

QRegExp - does not return all printable characters.

Scheduled Pinned Locked Moved Solved General and Desktop
15 Posts 4 Posters 864 Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • JonBJ JonB

    @AnneRanch said in QRegExp - does not return all printable characters.:

    QRegularExpression re("[ -~] ");

    This does not match a word. It matches a space/hyphen/tilde followed by a space. [EDIT No, see next answer below.] So match.captured(0) will return precisely 2 characters. [EDIT This is still true.]

    QRegExp rx("^\p[ -~]*$");
    QRegExp rx("(\d+)(\s*)(cm|inch(es)?)");

    Both of these --- and it would be same for QRegularExpression --- would require the \s in the literal strings to be written as \\ for each one, as per C++ rules for literal string characters.

    A Offline
    A Offline
    Anonymous_Banned275
    wrote on last edited by Anonymous_Banned275
    #5

    @JonB I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
    Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".

    JonBJ KroMignonK 2 Replies Last reply
    0
    • A Anonymous_Banned275

      @JonB I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
      Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".

      JonBJ Offline
      JonBJ Offline
      JonB
      wrote on last edited by JonB
      #6

      @AnneRanch said in QRegExp - does not return all printable characters.:

      I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ".

      And I cannot argue with this! Let me be the first to admit when I have made mistake :) I read too quickly, and quite forgot that - (hyphen) is the one character with "special significance" when appearing inside the [...] one-of-characters entity. There (and only there) it represents a "character range", from the character before it (space here) to the character after it (tilde here), which does indeed match all printable (ASCII) characters.

      So re("[ -~] ") does indeed match "any printable character followed by a space". That still does mean precisely 2 characters rather than a whole word, which would have to be the last character of a "word" followed by a space. However, it will also match 2 spaces, which is presumably not desired.

      There are several ways to pick out a "word" if that is what you want to do instead. The simplest would probably be (in your C++ code):

      QRegularExpression re("\\w+");
      

      but it all depends "which" word you might want in a line, what to do about the other characters in the line, etc.

      1 Reply Last reply
      1
      • A Anonymous_Banned275

        @JonB I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
        Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".

        KroMignonK Offline
        KroMignonK Offline
        KroMignon
        wrote on last edited by
        #7

        @AnneRanch said in QRegExp - does not return all printable characters.:

        I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
        Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".

        With QRegExp, the dash symbol in [] is used to define a range of characters (cf. Qt documentation https://doc.qt.io/qt-5/qregexp.html#sets-of-characters).

        I think the right way would be to use backslash as escape sequence:

        RegularExpression re("[ \\-~] ");
        

        It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth. (Sherlock Holmes)

        JonBJ 1 Reply Last reply
        0
        • KroMignonK KroMignon

          @AnneRanch said in QRegExp - does not return all printable characters.:

          I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
          Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".

          With QRegExp, the dash symbol in [] is used to define a range of characters (cf. Qt documentation https://doc.qt.io/qt-5/qregexp.html#sets-of-characters).

          I think the right way would be to use backslash as escape sequence:

          RegularExpression re("[ \\-~] ");
          
          JonBJ Offline
          JonBJ Offline
          JonB
          wrote on last edited by JonB
          #8
          This post is deleted!
          KroMignonK 1 Reply Last reply
          0
          • JonBJ JonB

            This post is deleted!

            KroMignonK Offline
            KroMignonK Offline
            KroMignon
            wrote on last edited by
            #9

            @JonB said in QRegExp - does not return all printable characters.:

            I don't think that's right: I believe that would match a space or any character between \ and ~!

            I have to admit that I don't have understand what is the goal of this regex.
            My comprehension was that he want to match only on 3 characters space, bash and tilde.
            But maybe I am wrong and the goal was to get any character between space and tilde (booth included) followed by a space.
            So the regex should be RegularExpression re("[ -~] ");
            But this also don't made sense to me.

            It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth. (Sherlock Holmes)

            JonBJ 1 Reply Last reply
            0
            • KroMignonK KroMignon

              @JonB said in QRegExp - does not return all printable characters.:

              I don't think that's right: I believe that would match a space or any character between \ and ~!

              I have to admit that I don't have understand what is the goal of this regex.
              My comprehension was that he want to match only on 3 characters space, bash and tilde.
              But maybe I am wrong and the goal was to get any character between space and tilde (booth included) followed by a space.
              So the regex should be RegularExpression re("[ -~] ");
              But this also don't made sense to me.

              JonBJ Offline
              JonBJ Offline
              JonB
              wrote on last edited by JonB
              #10

              @KroMignon
              Hi Kro. I had to delete my response you have quoted after I had written it! In my day, it used to be the case that to place a literal - inside a [...] character range you had to put it as the first or last character, so that it could not be interpreted as a "range" operator, which is what I was writing up. However, it seems that since those dim, dark UNIX System V.0 days you can now escape a literal - via \- so you can have it anywhere in the characters, just as you wrote. Hence I deleted my response to your response!

              But maybe I am wrong and the goal was to get any character between space and tilde (booth included) followed by a space.

              That is exactly what the OP must have got from the source she took it from. However as pointed out that would include space-followed-by-space, which is unlikely to be desirable/robust. And at best it would have picked out the last character of a word followed by a space, which still seems an "unusual" thing to want. I have since suggested \w+ as probably the most canonical way of picking out "a word".

              1 Reply Last reply
              0
              • A Offline
                A Offline
                Anonymous_Banned275
                wrote on last edited by
                #11

                I hope this may help this discussion.
                I am positing the "raw" file - all three lines.
                I have added a silly loop to see if I can mach more than FIRST word / character in each line .

                Stream file 
                Stream file  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                Regulsr expression ...test match 
                Regulsr expression ...match.hasMatch()
                Matched  "Waiting"
                Matched  ""
                Matched  ""
                Matched  ""
                Matched  ""
                linea:  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                Stream file 
                Stream file  "\u001B[0;94m[bluetooth]\u001B[0m# "
                Regulsr expression ...test match 
                Regulsr expression ...match.hasMatch()
                Matched  "0"
                Matched  ""
                Matched  ""
                Matched  ""
                Matched  ""
                linea:  "\u001B[0;94m[bluetooth]\u001B[0m# "
                Overload QString
                 File bt_utility_library.cpp 
                function  ProcessCommand 
                @ line   162 
                
                

                Again this is code under construction and posted in hope to help.

                I have used
                QRegularExpression re("\w+"); //matches first word / character

                Also to clarify
                at this point of learning using QRegulaExpression I just what to extract ALL words from the FIRST line.

                I'll tackle deleting the control characters AKA match only printable characters later.

                JonBJ 1 Reply Last reply
                0
                • A Offline
                  A Offline
                  Anonymous_Banned275
                  wrote on last edited by
                  #12

                  Guess what I found - nice interactive tool

                  https://regex101.com/

                  1 Reply Last reply
                  0
                  • A Anonymous_Banned275

                    I hope this may help this discussion.
                    I am positing the "raw" file - all three lines.
                    I have added a silly loop to see if I can mach more than FIRST word / character in each line .

                    Stream file 
                    Stream file  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                    Regulsr expression ...test match 
                    Regulsr expression ...match.hasMatch()
                    Matched  "Waiting"
                    Matched  ""
                    Matched  ""
                    Matched  ""
                    Matched  ""
                    linea:  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                    Stream file 
                    Stream file  "\u001B[0;94m[bluetooth]\u001B[0m# "
                    Regulsr expression ...test match 
                    Regulsr expression ...match.hasMatch()
                    Matched  "0"
                    Matched  ""
                    Matched  ""
                    Matched  ""
                    Matched  ""
                    linea:  "\u001B[0;94m[bluetooth]\u001B[0m# "
                    Overload QString
                     File bt_utility_library.cpp 
                    function  ProcessCommand 
                    @ line   162 
                    
                    

                    Again this is code under construction and posted in hope to help.

                    I have used
                    QRegularExpression re("\w+"); //matches first word / character

                    Also to clarify
                    at this point of learning using QRegulaExpression I just what to extract ALL words from the FIRST line.

                    I'll tackle deleting the control characters AKA match only printable characters later.

                    JonBJ Offline
                    JonBJ Offline
                    JonB
                    wrote on last edited by JonB
                    #13

                    @AnneRanch said in QRegExp - does not return all printable characters.:

                    QRegularExpression re("\w+"); //matches first word / character

                    Exactly as I wrote in response to your code earlier, if this is what you have written in your C++ source code it will not work.

                    I explained about the \ in C/C++ source code and said you would need to have:

                    QRegularExpression re("\\w+");
                    

                    Is that what you have?

                    at this point of learning using QRegulaExpression I just what to extract ALL words from the FIRST line.

                    If you want to match/extract multiple matches you must change to calling QRegularExpression::globalmatch(). This is described in https://doc.qt.io/qt-5/qregularexpression.html#global-matching, and the example there is exactly what you are asking for:

                    Global matching is useful to find all the occurrences of a given regular expression inside a subject string. Suppose that we want to extract all the words from a given string, where a word is a substring matching the pattern \w+.

                    QRegularExpression re("(\\w+)");
                    QRegularExpressionMatchIterator i = re.globalMatch("the quick fox");
                    
                    QStringList words;
                    while (i.hasNext()) {
                        QRegularExpressionMatch match = i.next();
                        QString word = match.captured(1);
                        words << word;
                    }
                    // words contains "the", "quick", "fox"
                    
                    1 Reply Last reply
                    2
                    • A Offline
                      A Offline
                      Anonymous_Banned275
                      wrote on last edited by
                      #14

                      Update / conclusion (??)
                      The attached TEST code works as expected using the ORIGINAL "range" pattern. Using "global match " solved it. It matches EACH individual character in the string - not the entire word.
                      Note there are NO "escape characters" in the pattern.
                      It also "prints" control characters - which is NOT desirable - so I will work on that.
                      It is probably slow, but it does not matter in my app.

                      #ifdef BYPASS
                                    terminal outoput 
                                      bluetoothctl
                                      Agent registered
                                      [bluetooth]#
                      
                                     string to analyze / match 
                                      linea:  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                                      linea:  "\u001B[0;94m[bluetooth]\u001B[0m# "
                      #endif
                      
                       //               QRegularExpression re("(\\w+)");  works OK 
                                      QRegularExpression re("([ -z])");       original range 
                                      QString pattern = re.pattern(); // pattern == "a third pattern"
                                      qDebug()<<"Pattern function " << pattern;
                                      TRACE_TextEdit->addItem("TRACE pattern ");
                                      TRACE_TextEdit->addItem(pattern);
                                      QRegularExpressionMatchIterator i = re.globalMatch(line);
                                      QString text = "Match ";
                                      QStringList words;
                                      while (i.hasNext()) {
                                          QRegularExpressionMatch match = i.next();
                                          QString word = match.captured(1);
                                          text += word;
                                          TRACE_TextEdit->addItem(text );
                                          words << word;
                                          qDebug()<<" Global match (words)  " << words;
                                      }
                      
                      JonBJ 1 Reply Last reply
                      0
                      • A Anonymous_Banned275

                        Update / conclusion (??)
                        The attached TEST code works as expected using the ORIGINAL "range" pattern. Using "global match " solved it. It matches EACH individual character in the string - not the entire word.
                        Note there are NO "escape characters" in the pattern.
                        It also "prints" control characters - which is NOT desirable - so I will work on that.
                        It is probably slow, but it does not matter in my app.

                        #ifdef BYPASS
                                      terminal outoput 
                                        bluetoothctl
                                        Agent registered
                                        [bluetooth]#
                        
                                       string to analyze / match 
                                        linea:  "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m#                                                                              \u001B[0;94m[bluetooth]\u001B[0m#                         Agent registered"
                                        linea:  "\u001B[0;94m[bluetooth]\u001B[0m# "
                        #endif
                        
                         //               QRegularExpression re("(\\w+)");  works OK 
                                        QRegularExpression re("([ -z])");       original range 
                                        QString pattern = re.pattern(); // pattern == "a third pattern"
                                        qDebug()<<"Pattern function " << pattern;
                                        TRACE_TextEdit->addItem("TRACE pattern ");
                                        TRACE_TextEdit->addItem(pattern);
                                        QRegularExpressionMatchIterator i = re.globalMatch(line);
                                        QString text = "Match ";
                                        QStringList words;
                                        while (i.hasNext()) {
                                            QRegularExpressionMatch match = i.next();
                                            QString word = match.captured(1);
                                            text += word;
                                            TRACE_TextEdit->addItem(text );
                                            words << word;
                                            qDebug()<<" Global match (words)  " << words;
                                        }
                        
                        JonBJ Offline
                        JonBJ Offline
                        JonB
                        wrote on last edited by JonB
                        #15

                        @AnneRanch said in QRegExp - does not return all printable characters.:

                        It matches EACH individual character in the string - not the entire word.

                        That is because you have chosen to use re("([ -z])") instead of the re("(\\w+)") suggested. It is the + symbol which causes match whole word/multiple characters instead of per character. Did you really want to match each individual character separately? Up to you.

                        1 Reply Last reply
                        0

                        • Login

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • Users
                        • Groups
                        • Search
                        • Get Qt Extensions
                        • Unsolved