Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. using reqular expression wrong

using reqular expression wrong

Scheduled Pinned Locked Moved Unsolved General and Desktop
31 Posts 6 Posters 3.7k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Chris Kawa
    22 Jun 2022, 13:29

    But wouldn't that be doing the work twice? It's easier to just enhance the expression to match the unwanted stuff. I don't know the format of those control characters but I'm sure you can define them as a regexp e.g. if you want to remove \u0001 and the likes it would be something like "\\\\u[\\d]{4}" ( \ followed by letter u followed by 4 digits).

    A Offline
    A Offline
    Anonymous_Banned275
    wrote on 22 Jun 2022, 15:16 last edited by
    #7

    @Chris-Kawa ...doing it twice is OK and using "exclusive or " would eliminate knowing the control code or having to figure out the expression ( I am basically lazy to do that ...)

    1 Reply Last reply
    0
    • V Offline
      V Offline
      VRonin
      wrote on 22 Jun 2022, 15:36 last edited by VRonin
      #8

      Try this

      qDebug() <<"stream raw line  \n " << line ;
      QString sanitisedLine;
      for (const QRegularExpressionMatch &match : QRegularExpression("[a-zA-Z_][a-zA-Z_0-9]*").globalMatch(line))
      sanitisedLine.append(match.captured(0));
      qDebug() <<"QRegularExpression applied  \n " << sanitisedLine;
      

      "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
      ~Napoleon Bonaparte

      On a crusade to banish setIndexWidget() from the holy land of Qt

      A 1 Reply Last reply 22 Jun 2022, 17:16
      1
      • J JonB
        22 Jun 2022, 15:13

        @AnneRanch

        \u0001\u001B[1;39m\u0002export
        \u0001\u001B[0m\u0002Print environment variables
        

        In the two examples you gave it appears the "ANSI escape sequence" is enclosed in \u0001 ... \u0002 in both cases. If this is always the case then it's very easy, something like:

        line.remove(QRegularExpression("\\001[^\\002]*\\002"));
        

        ought do it.

        However, if that is not always the case you would have to write a regular expression to match (so as to remove) all these "ANSI escape sequences". Which are something like:

        <ESC> [ ... <letter>
        

        at least in the cases you show. But you would have to go through and find lots of examples of these in the output you want to parse, as I believe there may be a variety of sequences other than the two you show so far.

        C Offline
        C Offline
        Chris Kawa
        Lifetime Qt Champion
        wrote on 22 Jun 2022, 16:30 last edited by
        #9

        @JonB With a small caveat that \ is an escape sequence both in C++ and in regexp, so to have an actual \ character matched you need 4 of those, so "\\\\0001[^\\\\0002]*\\\\0002". Yeah, the trouble we make for ourselves as an industry :P

        J 1 Reply Last reply 22 Jun 2022, 16:35
        0
        • C Chris Kawa
          22 Jun 2022, 16:30

          @JonB With a small caveat that \ is an escape sequence both in C++ and in regexp, so to have an actual \ character matched you need 4 of those, so "\\\\0001[^\\\\0002]*\\\\0002". Yeah, the trouble we make for ourselves as an industry :P

          J Offline
          J Offline
          JonB
          wrote on 22 Jun 2022, 16:35 last edited by JonB
          #10

          @Chris-Kawa
          I'm intending to pass \001 & \002 like that to regular expression. Then let it handle it. Which I think it will treat as number-character. Now that you make me think about that I'm wondering where I got that idea from....?

          You are going to pass \\0001. What do you think that is going to do/be parsed as in reg exp?

          Let's be clear: the OP's output like:

          \u0001\u001B
          

          is representing ASCII-char-1 and ASCII-char-27 (i.e. "Escape") bytes in that output, are we agreed?

          Maybe modern reg exps even accept \u0001 as a (Unicode??) character entity, I don't know?

          C 1 Reply Last reply 22 Jun 2022, 16:40
          0
          • J JonB
            22 Jun 2022, 16:35

            @Chris-Kawa
            I'm intending to pass \001 & \002 like that to regular expression. Then let it handle it. Which I think it will treat as number-character. Now that you make me think about that I'm wondering where I got that idea from....?

            You are going to pass \\0001. What do you think that is going to do/be parsed as in reg exp?

            Let's be clear: the OP's output like:

            \u0001\u001B
            

            is representing ASCII-char-1 and ASCII-char-27 (i.e. "Escape") bytes in that output, are we agreed?

            Maybe modern reg exps even accept \u0001 as a (Unicode??) character entity, I don't know?

            C Offline
            C Offline
            Chris Kawa
            Lifetime Qt Champion
            wrote on 22 Jun 2022, 16:40 last edited by
            #11

            @JonB Ah, fair enough. I thought \u0001 is an actual string (6 characters) and not a single character.

            J 2 Replies Last reply 22 Jun 2022, 16:46
            0
            • C Chris Kawa
              22 Jun 2022, 16:40

              @JonB Ah, fair enough. I thought \u0001 is an actual string (6 characters) and not a single character.

              J Offline
              J Offline
              JonB
              wrote on 22 Jun 2022, 16:46 last edited by JonB
              #12

              @Chris-Kawa
              No, these are byte representations. Like:

              \u0001\u001B[1;39m\u0002export
              

              From the past, the OP is obtaining from something like the output of a program running, or intended to run, in a terminal.

              I happen to know that there is a ANSI terminal escape sequence like:

              Esc [ row-number ; column-number m
              

              which I think is "move cursor to row-col", \u001B == 27 decimal == Escape char.

              All this stuff can be found in table at https://en.wikipedia.org/wiki/ANSI_escape_code#CSIsection

              1 Reply Last reply
              2
              • C Chris Kawa
                22 Jun 2022, 16:40

                @JonB Ah, fair enough. I thought \u0001 is an actual string (6 characters) and not a single character.

                J Offline
                J Offline
                JonB
                wrote on 22 Jun 2022, 16:57 last edited by JonB
                #13

                @Chris-Kawa
                You raise a good question though. I'm not sure whether QRegularExpression will interpret my \001 as I intended.

                How would you write the QRegularExpression to include matching characters like ASCII-1 or ASCII-27? I haven't kept up with how to reperesent that in reg exps nowadays? Maybe it's actually \u0001 & \u001B, is that a single (Unicode?) char sequence recognised in QRegularExpression??

                UPDATE
                I just looked on https://regex101.com/ and it does say

                \ddd

                Matches the 8-bit character with the given octal value.

                so I think my original dim recollection for using \001 & \002 may have been right/OK after all :)

                1 Reply Last reply
                0
                • V VRonin
                  22 Jun 2022, 15:36

                  Try this

                  qDebug() <<"stream raw line  \n " << line ;
                  QString sanitisedLine;
                  for (const QRegularExpressionMatch &match : QRegularExpression("[a-zA-Z_][a-zA-Z_0-9]*").globalMatch(line))
                  sanitisedLine.append(match.captured(0));
                  qDebug() <<"QRegularExpression applied  \n " << sanitisedLine;
                  
                  A Offline
                  A Offline
                  Anonymous_Banned275
                  wrote on 22 Jun 2022, 17:16 last edited by
                  #14

                  @VRonin

                  I am missing something here , I do not understand the error .

                  6ec658f0-4a0b-4ee7-8125-28777a12747f-image.png

                  I need to read-up on QRegularExpressionMatch - but I think you are on right track...

                  Would you kindly explain in few words what the code is doing ?
                  I think that would help me...

                  J 1 Reply Last reply 22 Jun 2022, 17:24
                  0
                  • A Anonymous_Banned275
                    22 Jun 2022, 17:16

                    @VRonin

                    I am missing something here , I do not understand the error .

                    6ec658f0-4a0b-4ee7-8125-28777a12747f-image.png

                    I need to read-up on QRegularExpressionMatch - but I think you are on right track...

                    Would you kindly explain in few words what the code is doing ?
                    I think that would help me...

                    J Offline
                    J Offline
                    JonB
                    wrote on 22 Jun 2022, 17:24 last edited by JonB
                    #15

                    @AnneRanch

                    I am missing something here , I do not understand the error .

                    https://doc.qt.io/qt-6/qregularexpressionmatchiterator.html#details

                    Starting with Qt 6.0, it is also possible to simply use the result of QRegularExpression::globalMatch in a range-based for loop, for instance like this:
                    ...
                    for (const QRegularExpressionMatch &match : re.globalMatch(subject)) {

                    Are you using Qt6 or Qt5?

                    1 Reply Last reply
                    1
                    • A Offline
                      A Offline
                      Anonymous_Banned275
                      wrote on 22 Jun 2022, 22:33 last edited by Anonymous_Banned275
                      #16

                      I hope this post does not distracts from the discussion .

                      1. I believe the whole concept to "search for individual ascii characters" was misleading . I have been there before and using "words" "w" should make more sense from start. .

                      2. The code snippet is "work in progress", hence has some stuff not really needed at this point.

                      3. As seen , I can retieve "word" LIST m but I am stomped on how to get QString, not a :list":

                      SOLVED
                      QString test = match.captured();
                      qDebug() <<"match name from ( list ) " << test;

                      Code

                                      line = stream.readLine();
                                      //qDebug() <<"Stream raw line  ";
                                      qDebug() <<"stream raw line  \n " << line ;
                      
                                      // extracts the words
                      QRegularExpression re("(\\w+)");
                      QString subject(line);
                      QString *capture_name; //  = "                            ";
                      QRegularExpressionMatchIterator i = re.globalMatch(subject);
                      while (i.hasNext()) {
                          QRegularExpressionMatch match = i.next();
                          //  qDebug() <<"match (next)     " << i.next() ;
                           qDebug() <<"match     " << match ;
                      
                      THIS SORT OF WORKS 
                           qDebug() <<"match   list  " << match.capturedTexts();
                      
                      HOW TO GET INDIVIDUAL QSTRING HERE 
                      **?????**
                       **//     qDebug() <<"match  name ( from  list )  " << match.captured(*capture_name);**
                      HOW TO GET INDIVIDUAL QSTRING HERE 
                      
                      }
                      
                      
                      

                      Output

                      Stream file 
                      Stream file ArrayIndex  0
                      stream raw line  
                        "\u0001\u001B[1;39m\u0002Menu main:\u0001\u001B[0m\u0002"
                      match      QRegularExpressionMatch(Valid, has match: 0:(3, 4, "1"), 1:(3, 4, "1"))
                      match   list  match.captured( ("1", "1")
                      match      QRegularExpressionMatch(Valid, has match: 0:(5, 8, "39m"), 1:(5, 8, "39m"))
                      match   list   ("39m", "39m")
                      match      QRegularExpressionMatch(Valid, has match: 0:(9, 13, "Menu"), 1:(9, 13, "Menu"))
                      **match   list   ("Menu", "Menu")**
                      match      QRegularExpressionMatch(Valid, has match: 0:(14, 18, "main"), 1:(14, 18, "main"))
                      **match   list   ("main", "main")**
                      match      QRegularExpressionMatch(Valid, has match: 0:(22, 24, "0m"), 1:(22, 24, "0m"))
                      match   list   ("0m", "0m")
                      QRegularExpression remove ascii applied  
                        "\u0001\u001B[1;39\u0002 :\u0001\u001B[0\u0002"
                      single character DONE 
                      
                      V 1 Reply Last reply 24 Jun 2022, 09:08
                      0
                      • A Offline
                        A Offline
                        Anonymous_Banned275
                        wrote on 23 Jun 2022, 16:08 last edited by
                        #17

                        I am trying to simplify the process

                        This regular expression works and removes all control code

                        QString result = inString.remove(QRegularExpression("[^\w\d ]+"));
                        qDebug() <<"QRegularExpression remove ascii applied \n " << result;

                        This regal expression DOES NOT WORK
                        I get run time error

                        QString::replace: invalid QRegularExpression object

                        It supposedly remove all control code

                        result  = inString.remove(QRegularExpression("[^\\u0000-\\u007F]+"));
                                qDebug() <<"QRegularExpression remove ascii applied  \n " << result;
                        

                        return result;

                        C J 2 Replies Last reply 23 Jun 2022, 16:56
                        0
                        • A Anonymous_Banned275
                          23 Jun 2022, 16:08

                          I am trying to simplify the process

                          This regular expression works and removes all control code

                          QString result = inString.remove(QRegularExpression("[^\w\d ]+"));
                          qDebug() <<"QRegularExpression remove ascii applied \n " << result;

                          This regal expression DOES NOT WORK
                          I get run time error

                          QString::replace: invalid QRegularExpression object

                          It supposedly remove all control code

                          result  = inString.remove(QRegularExpression("[^\\u0000-\\u007F]+"));
                                  qDebug() <<"QRegularExpression remove ascii applied  \n " << result;
                          

                          return result;

                          C Online
                          C Online
                          Christian Ehrlicher
                          Lifetime Qt Champion
                          wrote on 23 Jun 2022, 16:56 last edited by
                          #18

                          @AnneRanch said in using reqular expression wrong:

                          This regal expression DOES NOT WORK

                          Because \u0000 and \u007F are not valid for pcre -> https://www.regular-expressions.info/unicode.html#codepoint

                          Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
                          Visit the Qt Academy at https://academy.qt.io/catalog

                          1 Reply Last reply
                          2
                          • A Anonymous_Banned275
                            23 Jun 2022, 16:08

                            I am trying to simplify the process

                            This regular expression works and removes all control code

                            QString result = inString.remove(QRegularExpression("[^\w\d ]+"));
                            qDebug() <<"QRegularExpression remove ascii applied \n " << result;

                            This regal expression DOES NOT WORK
                            I get run time error

                            QString::replace: invalid QRegularExpression object

                            It supposedly remove all control code

                            result  = inString.remove(QRegularExpression("[^\\u0000-\\u007F]+"));
                                    qDebug() <<"QRegularExpression remove ascii applied  \n " << result;
                            

                            return result;

                            J Offline
                            J Offline
                            JonB
                            wrote on 23 Jun 2022, 17:00 last edited by JonB
                            #19

                            @AnneRanch
                            As @Christian-Ehrlicher has said.

                            That should be QRegularExpression("[^\\000-\\177]+")

                            However it will not do what you intend. It will remove all ASCII characters, as the comment said, and return an empty string.

                            I suspect you are wanting to try:

                            result  = inString.remove(QRegularExpression("[^\\000-\\037]+"));
                            

                            which will remove just the characters you have which are non-ASCII-printable control characters.
                            Your \u0001\u001B[1;39m\u0002export should result in [1;39mexport.

                            1 Reply Last reply
                            0
                            • A Offline
                              A Offline
                              Anonymous_Banned275
                              wrote on 23 Jun 2022, 17:50 last edited by
                              #20

                              I am not sure linking to other forums is OK , but here is a part of it

                              I am trying to port the Java code to C++ and this reference claims that
                              the "controls characters " are identified as "[^\u0000-\u007F]"

                              and that is my objective "remove" all control characters.

                              And this removes ascii , not control characters>

                              QString result = inString.remove(QRegularExpression("[^\000-\037]+"));

                              and that has been my issue since I started this - remove control characters using this expression "[^\000-\037]+"));

                              I thin I am not using "remove" and plain "match the expression " correctly .

                              https://stackoverflow.com/questions/24229262/match-non-printable-non-ascii-characters-and-remove-from-text
                              public static string RemoveTroublesomeCharacters(string inString)
                              {
                              if (inString == null)
                              {
                              return null;
                              }

                              else
                              {
                                  char ch;
                                  Regex regex = new Regex(@"[^\u0000-\u007F]", RegexOptions.IgnoreCase);
                                  Match charMatch = regex.Match(inString);
                              
                              J 1 Reply Last reply 23 Jun 2022, 17:55
                              0
                              • A Anonymous_Banned275
                                23 Jun 2022, 17:50

                                I am not sure linking to other forums is OK , but here is a part of it

                                I am trying to port the Java code to C++ and this reference claims that
                                the "controls characters " are identified as "[^\u0000-\u007F]"

                                and that is my objective "remove" all control characters.

                                And this removes ascii , not control characters>

                                QString result = inString.remove(QRegularExpression("[^\000-\037]+"));

                                and that has been my issue since I started this - remove control characters using this expression "[^\000-\037]+"));

                                I thin I am not using "remove" and plain "match the expression " correctly .

                                https://stackoverflow.com/questions/24229262/match-non-printable-non-ascii-characters-and-remove-from-text
                                public static string RemoveTroublesomeCharacters(string inString)
                                {
                                if (inString == null)
                                {
                                return null;
                                }

                                else
                                {
                                    char ch;
                                    Regex regex = new Regex(@"[^\u0000-\u007F]", RegexOptions.IgnoreCase);
                                    Match charMatch = regex.Match(inString);
                                
                                J Offline
                                J Offline
                                JonB
                                wrote on 23 Jun 2022, 17:55 last edited by JonB
                                #21

                                @AnneRanch
                                That code you are trying to use is for regular expressions understood by .NET. They are not identical to those used by Qt.

                                And this removes ascii , not control characters>

                                QString result = inString.remove(QRegularExpression("[^\\000-\\037]+"));

                                Just remove the ^ I wrote (I forgot you were removing rather than retaining). Should be:

                                QString result = inString.remove(QRegularExpression("[\\000-\\037]+"));
                                
                                1 Reply Last reply
                                0
                                • A Anonymous_Banned275
                                  22 Jun 2022, 22:33

                                  I hope this post does not distracts from the discussion .

                                  1. I believe the whole concept to "search for individual ascii characters" was misleading . I have been there before and using "words" "w" should make more sense from start. .

                                  2. The code snippet is "work in progress", hence has some stuff not really needed at this point.

                                  3. As seen , I can retieve "word" LIST m but I am stomped on how to get QString, not a :list":

                                  SOLVED
                                  QString test = match.captured();
                                  qDebug() <<"match name from ( list ) " << test;

                                  Code

                                                  line = stream.readLine();
                                                  //qDebug() <<"Stream raw line  ";
                                                  qDebug() <<"stream raw line  \n " << line ;
                                  
                                                  // extracts the words
                                  QRegularExpression re("(\\w+)");
                                  QString subject(line);
                                  QString *capture_name; //  = "                            ";
                                  QRegularExpressionMatchIterator i = re.globalMatch(subject);
                                  while (i.hasNext()) {
                                      QRegularExpressionMatch match = i.next();
                                      //  qDebug() <<"match (next)     " << i.next() ;
                                       qDebug() <<"match     " << match ;
                                  
                                  THIS SORT OF WORKS 
                                       qDebug() <<"match   list  " << match.capturedTexts();
                                  
                                  HOW TO GET INDIVIDUAL QSTRING HERE 
                                  **?????**
                                   **//     qDebug() <<"match  name ( from  list )  " << match.captured(*capture_name);**
                                  HOW TO GET INDIVIDUAL QSTRING HERE 
                                  
                                  }
                                  
                                  
                                  

                                  Output

                                  Stream file 
                                  Stream file ArrayIndex  0
                                  stream raw line  
                                    "\u0001\u001B[1;39m\u0002Menu main:\u0001\u001B[0m\u0002"
                                  match      QRegularExpressionMatch(Valid, has match: 0:(3, 4, "1"), 1:(3, 4, "1"))
                                  match   list  match.captured( ("1", "1")
                                  match      QRegularExpressionMatch(Valid, has match: 0:(5, 8, "39m"), 1:(5, 8, "39m"))
                                  match   list   ("39m", "39m")
                                  match      QRegularExpressionMatch(Valid, has match: 0:(9, 13, "Menu"), 1:(9, 13, "Menu"))
                                  **match   list   ("Menu", "Menu")**
                                  match      QRegularExpressionMatch(Valid, has match: 0:(14, 18, "main"), 1:(14, 18, "main"))
                                  **match   list   ("main", "main")**
                                  match      QRegularExpressionMatch(Valid, has match: 0:(22, 24, "0m"), 1:(22, 24, "0m"))
                                  match   list   ("0m", "0m")
                                  QRegularExpression remove ascii applied  
                                    "\u0001\u001B[1;39\u0002 :\u0001\u001B[0\u0002"
                                  single character DONE 
                                  
                                  V Offline
                                  V Offline
                                  VRonin
                                  wrote on 24 Jun 2022, 09:08 last edited by
                                  #22

                                  @AnneRanch said in using reqular expression wrong:

                                  THIS SORT OF WORKS
                                  qDebug() <<"match list " << match.capturedTexts();

                                  HOW TO GET INDIVIDUAL QSTRING HERE

                                  match.captured(0);

                                  "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
                                  ~Napoleon Bonaparte

                                  On a crusade to banish setIndexWidget() from the holy land of Qt

                                  J 1 Reply Last reply 24 Jun 2022, 09:17
                                  0
                                  • V VRonin
                                    24 Jun 2022, 09:08

                                    @AnneRanch said in using reqular expression wrong:

                                    THIS SORT OF WORKS
                                    qDebug() <<"match list " << match.capturedTexts();

                                    HOW TO GET INDIVIDUAL QSTRING HERE

                                    match.captured(0);

                                    J Offline
                                    J Offline
                                    JonB
                                    wrote on 24 Jun 2022, 09:17 last edited by JonB
                                    #23

                                    @VRonin
                                    If the OP ever returns to look at the answers to this question, it would be a shame if she did not first try the simple

                                    QString result = inString.remove(QRegularExpression("[\\000-\\037]+"));
                                    

                                    at least to see if that is acceptable to her, compared to other more complex regular expression solutions....

                                    [I have said that none proposed so far will be perfect, she would have to deal properly with removing just the ANSI escape sequences if she wants it to be really right.]

                                    1 Reply Last reply
                                    1
                                    • C Offline
                                      C Offline
                                      ChrisW67
                                      wrote on 24 Jun 2022, 11:21 last edited by
                                      #24

                                      @AnneRanch said in using reqular expression wrong:

                                      I am trying to port the Java code to C++ and this reference claims that
                                      the "controls characters " are identified as "[^\u0000-\u007F]"

                                      Well, that reference is wrong. This is the Unicode basic Latin page, covering code points from 0 through 127 decimal, which were specifically designed to be identical to ASCII codes. You will see that only the first 32 code points (0x0000 through 0x001F) and last code point (0x007f, Del) are non-printables: the remainder are printable characters. There are other non-printables outside this range also.

                                      and that is my objective "remove" all control characters.
                                      And this removes ascii , not control characters>
                                      QString result = inString.remove(QRegularExpression("[^\000-\037]+"));
                                      and that has been my issue since I started this - remove control characters using this expression "[^\000-\037]+"));

                                      The regular expression matches any run of characters that is not in the range 0 to 31 decimal. You ask Qt to remove any character that the pattern matches: it does, leaving only those things in the control character block. You want the opposite of that.

                                      It turns out that the documented regular expression dialect allows the POSIX character classes which can make life easier:

                                      #include <QCoreApplication>
                                      #include <QString>
                                      #include <QRegularExpression>
                                      #include <QDebug>
                                      
                                      int main(int argc, char **argv) {
                                              QCoreApplication app(argc, argv);
                                      
                                              QString testString("ABC\tabc\177DEF-def\n\007");
                                      
                                              // following removes all the ASCII printables (i.e. your broken result)
                                              QString temp(testString);
                                              temp.remove(QRegularExpression("[^\\000-\\037]+"));
                                              qDebug() << testString << "==>" << temp;
                                      
                                              // following removes all except the ASCII printables
                                              temp = testString;
                                              temp.remove(QRegularExpression("[\\000-\\037\\177]+"));
                                              qDebug() << testString << "==>" << temp;
                                      
                                              // Following uses a POSIX character class to remove control characters
                                              // (which include TAB and NL).
                                              temp = testString;
                                              temp.remove(QRegularExpression("[[:cntrl:]]+"));
                                              qDebug() << testString << "==>" << temp;
                                      
                                              return 0;
                                      }
                                      

                                      Output:

                                      "ABC\tabc\u007FDEF-def\n\u0007" ==> "\t\n\u0007"
                                      "ABC\tabc\u007FDEF-def\n\u0007" ==> "ABCabcDEF-def"
                                      "ABC\tabc\u007FDEF-def\n\u0007" ==> "ABCabcDEF-def"
                                      
                                      J 1 Reply Last reply 24 Jun 2022, 11:27
                                      1
                                      • C ChrisW67
                                        24 Jun 2022, 11:21

                                        @AnneRanch said in using reqular expression wrong:

                                        I am trying to port the Java code to C++ and this reference claims that
                                        the "controls characters " are identified as "[^\u0000-\u007F]"

                                        Well, that reference is wrong. This is the Unicode basic Latin page, covering code points from 0 through 127 decimal, which were specifically designed to be identical to ASCII codes. You will see that only the first 32 code points (0x0000 through 0x001F) and last code point (0x007f, Del) are non-printables: the remainder are printable characters. There are other non-printables outside this range also.

                                        and that is my objective "remove" all control characters.
                                        And this removes ascii , not control characters>
                                        QString result = inString.remove(QRegularExpression("[^\000-\037]+"));
                                        and that has been my issue since I started this - remove control characters using this expression "[^\000-\037]+"));

                                        The regular expression matches any run of characters that is not in the range 0 to 31 decimal. You ask Qt to remove any character that the pattern matches: it does, leaving only those things in the control character block. You want the opposite of that.

                                        It turns out that the documented regular expression dialect allows the POSIX character classes which can make life easier:

                                        #include <QCoreApplication>
                                        #include <QString>
                                        #include <QRegularExpression>
                                        #include <QDebug>
                                        
                                        int main(int argc, char **argv) {
                                                QCoreApplication app(argc, argv);
                                        
                                                QString testString("ABC\tabc\177DEF-def\n\007");
                                        
                                                // following removes all the ASCII printables (i.e. your broken result)
                                                QString temp(testString);
                                                temp.remove(QRegularExpression("[^\\000-\\037]+"));
                                                qDebug() << testString << "==>" << temp;
                                        
                                                // following removes all except the ASCII printables
                                                temp = testString;
                                                temp.remove(QRegularExpression("[\\000-\\037\\177]+"));
                                                qDebug() << testString << "==>" << temp;
                                        
                                                // Following uses a POSIX character class to remove control characters
                                                // (which include TAB and NL).
                                                temp = testString;
                                                temp.remove(QRegularExpression("[[:cntrl:]]+"));
                                                qDebug() << testString << "==>" << temp;
                                        
                                                return 0;
                                        }
                                        

                                        Output:

                                        "ABC\tabc\u007FDEF-def\n\u0007" ==> "\t\n\u0007"
                                        "ABC\tabc\u007FDEF-def\n\u0007" ==> "ABCabcDEF-def"
                                        "ABC\tabc\u007FDEF-def\n\u0007" ==> "ABCabcDEF-def"
                                        
                                        J Offline
                                        J Offline
                                        JonB
                                        wrote on 24 Jun 2022, 11:27 last edited by
                                        #25

                                        @ChrisW67 said in using reqular expression wrong:

                                        You want the opposite of that.

                                        I did reply earlier:

                                        Just remove the ^ I wrote (I forgot you were removing rather than retaining). Should be:

                                        QString result = inString.remove(QRegularExpression("[\\000-\\037]+"));
                                        
                                        1 Reply Last reply
                                        1
                                        • A Offline
                                          A Offline
                                          Anonymous_Banned275
                                          wrote on 24 Jun 2022, 15:16 last edited by Anonymous_Banned275
                                          #26
                                          1. JobB please get off your horse - this is a discussions and we all have difference of opinions - which is what discussions are for.
                                            ( You remind me of "study group " I had years ago where certain cultures insisted on "we all have to have same opinion and agree ... then we can go home ')
                                          2. I did state I am porting from Java , hence the source ( I used ) is different...
                                            ( I realize things get missed . miss-read etc. )
                                          3. There are two concepts ( to get the job done ) - so far
                                            identify all ASCII characters
                                            remove all control characters

                                          Here is the code :

                                          #ifdef BYPASS
                                                
                                                  QRegularExpression re("[^\\w\\d (:/<>) ]+");
                                                  QString result  = inString.remove(re); // keep  all ascii plus some 
                                                  qDebug() <<"remove all controls \n    " << result;
                                                  return result;
                                          #endif
                                                  
                                                  QString result = inString.remove(QRegularExpression("[\\000-\\037]+"));
                                                  qDebug() <<"remove all controls \n    " << result;
                                                  return result;
                                          

                                          They both leave some unwanted characters. Those are easy to remove after
                                          "regular expression" is done.
                                          4. Looks as "match" is OK but too complex to accomplish what I want.

                                          1. AS the original title said - I was using the concept wrong - did not pay attention to actual expression - identifying or deleting stuff.

                                          I really appreciate everybody input , it has been educational.

                                          Cheers

                                          J 1 Reply Last reply 24 Jun 2022, 16:26
                                          0

                                          16/31

                                          22 Jun 2022, 22:33

                                          • Login

                                          • Login or register to search.
                                          16 out of 31
                                          • First post
                                            16/31
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Users
                                          • Groups
                                          • Search
                                          • Get Qt Extensions
                                          • Unsolved