Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. using reqular expression wrong
Forum Updated to NodeBB v4.3 + New Features

using reqular expression wrong

Scheduled Pinned Locked Moved Unsolved General and Desktop
31 Posts 6 Posters 3.8k Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    Anonymous_Banned275
    wrote on last edited by
    #20

    I am not sure linking to other forums is OK , but here is a part of it

    I am trying to port the Java code to C++ and this reference claims that
    the "controls characters " are identified as "[^\u0000-\u007F]"

    and that is my objective "remove" all control characters.

    And this removes ascii , not control characters>

    QString result = inString.remove(QRegularExpression("[^\000-\037]+"));

    and that has been my issue since I started this - remove control characters using this expression "[^\000-\037]+"));

    I thin I am not using "remove" and plain "match the expression " correctly .

    https://stackoverflow.com/questions/24229262/match-non-printable-non-ascii-characters-and-remove-from-text
    public static string RemoveTroublesomeCharacters(string inString)
    {
    if (inString == null)
    {
    return null;
    }

    else
    {
        char ch;
        Regex regex = new Regex(@"[^\u0000-\u007F]", RegexOptions.IgnoreCase);
        Match charMatch = regex.Match(inString);
    
    JonBJ 1 Reply Last reply
    0
    • A Anonymous_Banned275

      I am not sure linking to other forums is OK , but here is a part of it

      I am trying to port the Java code to C++ and this reference claims that
      the "controls characters " are identified as "[^\u0000-\u007F]"

      and that is my objective "remove" all control characters.

      And this removes ascii , not control characters>

      QString result = inString.remove(QRegularExpression("[^\000-\037]+"));

      and that has been my issue since I started this - remove control characters using this expression "[^\000-\037]+"));

      I thin I am not using "remove" and plain "match the expression " correctly .

      https://stackoverflow.com/questions/24229262/match-non-printable-non-ascii-characters-and-remove-from-text
      public static string RemoveTroublesomeCharacters(string inString)
      {
      if (inString == null)
      {
      return null;
      }

      else
      {
          char ch;
          Regex regex = new Regex(@"[^\u0000-\u007F]", RegexOptions.IgnoreCase);
          Match charMatch = regex.Match(inString);
      
      JonBJ Offline
      JonBJ Offline
      JonB
      wrote on last edited by JonB
      #21

      @AnneRanch
      That code you are trying to use is for regular expressions understood by .NET. They are not identical to those used by Qt.

      And this removes ascii , not control characters>

      QString result = inString.remove(QRegularExpression("[^\\000-\\037]+"));

      Just remove the ^ I wrote (I forgot you were removing rather than retaining). Should be:

      QString result = inString.remove(QRegularExpression("[\\000-\\037]+"));
      
      1 Reply Last reply
      0
      • A Anonymous_Banned275

        I hope this post does not distracts from the discussion .

        1. I believe the whole concept to "search for individual ascii characters" was misleading . I have been there before and using "words" "w" should make more sense from start. .

        2. The code snippet is "work in progress", hence has some stuff not really needed at this point.

        3. As seen , I can retieve "word" LIST m but I am stomped on how to get QString, not a :list":

        SOLVED
        QString test = match.captured();
        qDebug() <<"match name from ( list ) " << test;

        Code

                        line = stream.readLine();
                        //qDebug() <<"Stream raw line  ";
                        qDebug() <<"stream raw line  \n " << line ;
        
                        // extracts the words
        QRegularExpression re("(\\w+)");
        QString subject(line);
        QString *capture_name; //  = "                            ";
        QRegularExpressionMatchIterator i = re.globalMatch(subject);
        while (i.hasNext()) {
            QRegularExpressionMatch match = i.next();
            //  qDebug() <<"match (next)     " << i.next() ;
             qDebug() <<"match     " << match ;
        
        THIS SORT OF WORKS 
             qDebug() <<"match   list  " << match.capturedTexts();
        
        HOW TO GET INDIVIDUAL QSTRING HERE 
        **?????**
         **//     qDebug() <<"match  name ( from  list )  " << match.captured(*capture_name);**
        HOW TO GET INDIVIDUAL QSTRING HERE 
        
        }
        
        
        

        Output

        Stream file 
        Stream file ArrayIndex  0
        stream raw line  
          "\u0001\u001B[1;39m\u0002Menu main:\u0001\u001B[0m\u0002"
        match      QRegularExpressionMatch(Valid, has match: 0:(3, 4, "1"), 1:(3, 4, "1"))
        match   list  match.captured( ("1", "1")
        match      QRegularExpressionMatch(Valid, has match: 0:(5, 8, "39m"), 1:(5, 8, "39m"))
        match   list   ("39m", "39m")
        match      QRegularExpressionMatch(Valid, has match: 0:(9, 13, "Menu"), 1:(9, 13, "Menu"))
        **match   list   ("Menu", "Menu")**
        match      QRegularExpressionMatch(Valid, has match: 0:(14, 18, "main"), 1:(14, 18, "main"))
        **match   list   ("main", "main")**
        match      QRegularExpressionMatch(Valid, has match: 0:(22, 24, "0m"), 1:(22, 24, "0m"))
        match   list   ("0m", "0m")
        QRegularExpression remove ascii applied  
          "\u0001\u001B[1;39\u0002 :\u0001\u001B[0\u0002"
        single character DONE 
        
        VRoninV Offline
        VRoninV Offline
        VRonin
        wrote on last edited by
        #22

        @AnneRanch said in using reqular expression wrong:

        THIS SORT OF WORKS
        qDebug() <<"match list " << match.capturedTexts();

        HOW TO GET INDIVIDUAL QSTRING HERE

        match.captured(0);

        "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
        ~Napoleon Bonaparte

        On a crusade to banish setIndexWidget() from the holy land of Qt

        JonBJ 1 Reply Last reply
        0
        • VRoninV VRonin

          @AnneRanch said in using reqular expression wrong:

          THIS SORT OF WORKS
          qDebug() <<"match list " << match.capturedTexts();

          HOW TO GET INDIVIDUAL QSTRING HERE

          match.captured(0);

          JonBJ Offline
          JonBJ Offline
          JonB
          wrote on last edited by JonB
          #23

          @VRonin
          If the OP ever returns to look at the answers to this question, it would be a shame if she did not first try the simple

          QString result = inString.remove(QRegularExpression("[\\000-\\037]+"));
          

          at least to see if that is acceptable to her, compared to other more complex regular expression solutions....

          [I have said that none proposed so far will be perfect, she would have to deal properly with removing just the ANSI escape sequences if she wants it to be really right.]

          1 Reply Last reply
          1
          • C Offline
            C Offline
            ChrisW67
            wrote on last edited by
            #24

            @AnneRanch said in using reqular expression wrong:

            I am trying to port the Java code to C++ and this reference claims that
            the "controls characters " are identified as "[^\u0000-\u007F]"

            Well, that reference is wrong. This is the Unicode basic Latin page, covering code points from 0 through 127 decimal, which were specifically designed to be identical to ASCII codes. You will see that only the first 32 code points (0x0000 through 0x001F) and last code point (0x007f, Del) are non-printables: the remainder are printable characters. There are other non-printables outside this range also.

            and that is my objective "remove" all control characters.
            And this removes ascii , not control characters>
            QString result = inString.remove(QRegularExpression("[^\000-\037]+"));
            and that has been my issue since I started this - remove control characters using this expression "[^\000-\037]+"));

            The regular expression matches any run of characters that is not in the range 0 to 31 decimal. You ask Qt to remove any character that the pattern matches: it does, leaving only those things in the control character block. You want the opposite of that.

            It turns out that the documented regular expression dialect allows the POSIX character classes which can make life easier:

            #include <QCoreApplication>
            #include <QString>
            #include <QRegularExpression>
            #include <QDebug>
            
            int main(int argc, char **argv) {
                    QCoreApplication app(argc, argv);
            
                    QString testString("ABC\tabc\177DEF-def\n\007");
            
                    // following removes all the ASCII printables (i.e. your broken result)
                    QString temp(testString);
                    temp.remove(QRegularExpression("[^\\000-\\037]+"));
                    qDebug() << testString << "==>" << temp;
            
                    // following removes all except the ASCII printables
                    temp = testString;
                    temp.remove(QRegularExpression("[\\000-\\037\\177]+"));
                    qDebug() << testString << "==>" << temp;
            
                    // Following uses a POSIX character class to remove control characters
                    // (which include TAB and NL).
                    temp = testString;
                    temp.remove(QRegularExpression("[[:cntrl:]]+"));
                    qDebug() << testString << "==>" << temp;
            
                    return 0;
            }
            

            Output:

            "ABC\tabc\u007FDEF-def\n\u0007" ==> "\t\n\u0007"
            "ABC\tabc\u007FDEF-def\n\u0007" ==> "ABCabcDEF-def"
            "ABC\tabc\u007FDEF-def\n\u0007" ==> "ABCabcDEF-def"
            
            JonBJ 1 Reply Last reply
            1
            • C ChrisW67

              @AnneRanch said in using reqular expression wrong:

              I am trying to port the Java code to C++ and this reference claims that
              the "controls characters " are identified as "[^\u0000-\u007F]"

              Well, that reference is wrong. This is the Unicode basic Latin page, covering code points from 0 through 127 decimal, which were specifically designed to be identical to ASCII codes. You will see that only the first 32 code points (0x0000 through 0x001F) and last code point (0x007f, Del) are non-printables: the remainder are printable characters. There are other non-printables outside this range also.

              and that is my objective "remove" all control characters.
              And this removes ascii , not control characters>
              QString result = inString.remove(QRegularExpression("[^\000-\037]+"));
              and that has been my issue since I started this - remove control characters using this expression "[^\000-\037]+"));

              The regular expression matches any run of characters that is not in the range 0 to 31 decimal. You ask Qt to remove any character that the pattern matches: it does, leaving only those things in the control character block. You want the opposite of that.

              It turns out that the documented regular expression dialect allows the POSIX character classes which can make life easier:

              #include <QCoreApplication>
              #include <QString>
              #include <QRegularExpression>
              #include <QDebug>
              
              int main(int argc, char **argv) {
                      QCoreApplication app(argc, argv);
              
                      QString testString("ABC\tabc\177DEF-def\n\007");
              
                      // following removes all the ASCII printables (i.e. your broken result)
                      QString temp(testString);
                      temp.remove(QRegularExpression("[^\\000-\\037]+"));
                      qDebug() << testString << "==>" << temp;
              
                      // following removes all except the ASCII printables
                      temp = testString;
                      temp.remove(QRegularExpression("[\\000-\\037\\177]+"));
                      qDebug() << testString << "==>" << temp;
              
                      // Following uses a POSIX character class to remove control characters
                      // (which include TAB and NL).
                      temp = testString;
                      temp.remove(QRegularExpression("[[:cntrl:]]+"));
                      qDebug() << testString << "==>" << temp;
              
                      return 0;
              }
              

              Output:

              "ABC\tabc\u007FDEF-def\n\u0007" ==> "\t\n\u0007"
              "ABC\tabc\u007FDEF-def\n\u0007" ==> "ABCabcDEF-def"
              "ABC\tabc\u007FDEF-def\n\u0007" ==> "ABCabcDEF-def"
              
              JonBJ Offline
              JonBJ Offline
              JonB
              wrote on last edited by
              #25

              @ChrisW67 said in using reqular expression wrong:

              You want the opposite of that.

              I did reply earlier:

              Just remove the ^ I wrote (I forgot you were removing rather than retaining). Should be:

              QString result = inString.remove(QRegularExpression("[\\000-\\037]+"));
              
              1 Reply Last reply
              1
              • A Offline
                A Offline
                Anonymous_Banned275
                wrote on last edited by Anonymous_Banned275
                #26
                1. JobB please get off your horse - this is a discussions and we all have difference of opinions - which is what discussions are for.
                  ( You remind me of "study group " I had years ago where certain cultures insisted on "we all have to have same opinion and agree ... then we can go home ')
                2. I did state I am porting from Java , hence the source ( I used ) is different...
                  ( I realize things get missed . miss-read etc. )
                3. There are two concepts ( to get the job done ) - so far
                  identify all ASCII characters
                  remove all control characters

                Here is the code :

                #ifdef BYPASS
                      
                        QRegularExpression re("[^\\w\\d (:/<>) ]+");
                        QString result  = inString.remove(re); // keep  all ascii plus some 
                        qDebug() <<"remove all controls \n    " << result;
                        return result;
                #endif
                        
                        QString result = inString.remove(QRegularExpression("[\\000-\\037]+"));
                        qDebug() <<"remove all controls \n    " << result;
                        return result;
                

                They both leave some unwanted characters. Those are easy to remove after
                "regular expression" is done.
                4. Looks as "match" is OK but too complex to accomplish what I want.

                1. AS the original title said - I was using the concept wrong - did not pay attention to actual expression - identifying or deleting stuff.

                I really appreciate everybody input , it has been educational.

                Cheers

                JonBJ 1 Reply Last reply
                0
                • A Anonymous_Banned275
                  1. JobB please get off your horse - this is a discussions and we all have difference of opinions - which is what discussions are for.
                    ( You remind me of "study group " I had years ago where certain cultures insisted on "we all have to have same opinion and agree ... then we can go home ')
                  2. I did state I am porting from Java , hence the source ( I used ) is different...
                    ( I realize things get missed . miss-read etc. )
                  3. There are two concepts ( to get the job done ) - so far
                    identify all ASCII characters
                    remove all control characters

                  Here is the code :

                  #ifdef BYPASS
                        
                          QRegularExpression re("[^\\w\\d (:/<>) ]+");
                          QString result  = inString.remove(re); // keep  all ascii plus some 
                          qDebug() <<"remove all controls \n    " << result;
                          return result;
                  #endif
                          
                          QString result = inString.remove(QRegularExpression("[\\000-\\037]+"));
                          qDebug() <<"remove all controls \n    " << result;
                          return result;
                  

                  They both leave some unwanted characters. Those are easy to remove after
                  "regular expression" is done.
                  4. Looks as "match" is OK but too complex to accomplish what I want.

                  1. AS the original title said - I was using the concept wrong - did not pay attention to actual expression - identifying or deleting stuff.

                  I really appreciate everybody input , it has been educational.

                  Cheers

                  JonBJ Offline
                  JonBJ Offline
                  JonB
                  wrote on last edited by JonB
                  #27

                  @AnneRanch said in using reqular expression wrong:

                  JobB please get off your horse - this is a discussions and we all have difference of opinions - which is what discussions are for.

                  What are you talking about? I gave you the code you need to remove all non-ASCII chars. That's all. And as usual got abuse back. I know you are rude to everybody, but any reason to single me out? :) Oh, and I just saw you use what I suggested and still are cross with me!

                  A 1 Reply Last reply
                  0
                  • JonBJ JonB

                    @AnneRanch said in using reqular expression wrong:

                    JobB please get off your horse - this is a discussions and we all have difference of opinions - which is what discussions are for.

                    What are you talking about? I gave you the code you need to remove all non-ASCII chars. That's all. And as usual got abuse back. I know you are rude to everybody, but any reason to single me out? :) Oh, and I just saw you use what I suggested and still are cross with me!

                    A Offline
                    A Offline
                    Anonymous_Banned275
                    wrote on last edited by
                    #28

                    @JonB ok let's get serious Your posts are great technically, but you just cannot say it without making comments - such as " if he comes back ..."
                    "I told you so ..." etc.
                    I realize that each of us has different way to express stuff and that is perfectly OK .
                    My gut feeling is - I am not native English speaker and not used to this sentence structure:

                    " ...YOU can do it this way , I ALREADY TOLD YOU SO . "

                    In may native language I would say
                    " ... do it this way, "

                    Cheers

                    JonBJ 1 Reply Last reply
                    0
                    • A Anonymous_Banned275

                      @JonB ok let's get serious Your posts are great technically, but you just cannot say it without making comments - such as " if he comes back ..."
                      "I told you so ..." etc.
                      I realize that each of us has different way to express stuff and that is perfectly OK .
                      My gut feeling is - I am not native English speaker and not used to this sentence structure:

                      " ...YOU can do it this way , I ALREADY TOLD YOU SO . "

                      In may native language I would say
                      " ... do it this way, "

                      Cheers

                      JonBJ Offline
                      JonBJ Offline
                      JonB
                      wrote on last edited by JonB
                      #29

                      @AnneRanch
                      Ah, OK. The trouble is we seem to tell you stuff and you often seem to ignore it and not act on it. That can be frustrating. But I will (try to) bear in mind what you say when replying in your threads :)

                      1 Reply Last reply
                      0
                      • A Offline
                        A Offline
                        Anonymous_Banned275
                        wrote on last edited by Anonymous_Banned275
                        #30

                        SOLVED
                        use QString "replace" instead...

                        I need more help making the actual expression

                        QRegularExpression re("[\000-\037[1;139m]+")

                        This works BUT deletes EVERY occurrence of "m" .

                        I like to delete ONLY this string "[1;139m"

                        PS
                        can anybody recommend "use regular expressing examples in C++"?
                        I am getting too many "tutorials" and like to know group recommendation .

                        This one does not really explain stuff, just looks pretty (IMHO) ,,,

                        https://www.softwaretestinghelp.com/regex-in-cpp/

                        JonBJ 1 Reply Last reply
                        0
                        • A Anonymous_Banned275

                          SOLVED
                          use QString "replace" instead...

                          I need more help making the actual expression

                          QRegularExpression re("[\000-\037[1;139m]+")

                          This works BUT deletes EVERY occurrence of "m" .

                          I like to delete ONLY this string "[1;139m"

                          PS
                          can anybody recommend "use regular expressing examples in C++"?
                          I am getting too many "tutorials" and like to know group recommendation .

                          This one does not really explain stuff, just looks pretty (IMHO) ,,,

                          https://www.softwaretestinghelp.com/regex-in-cpp/

                          JonBJ Offline
                          JonBJ Offline
                          JonB
                          wrote on last edited by JonB
                          #31

                          @AnneRanch
                          It gets harder to write the the regular expression for that case.

                          In all the examples you have shown so far, like

                          stream raw line  
                            "\u0001\u001B[1;39m\u0002export          \u0001\u001B[0m\u0002Print environment variables"
                          

                          they all look like

                          \u0001...\u0002
                          

                          That means they have an ASCII-1 at the start and an ASCII-2 at the end. If all your cases look like this, then:

                          line.remove(QRegularExpression("\\001[^\\002]*\\002"));
                          

                          should get rid of just what you want, and leave no "artefact bits".

                          1 Reply Last reply
                          0

                          • Login

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • Users
                          • Groups
                          • Search
                          • Get Qt Extensions
                          • Unsolved