QRegExp - does not return all printable characters.
-
QRegExp rx("^\p[ -~]*$");
The above code does not work - in C++. .
Not sure about the syntax.This code works fine
QRegExp rx("(\d+)(\s*)(cm|inch(es)?)");I also cannot find meaning of "\p"in QT doc.
. -
Hi,
QRegExp has been deprecated in Qt 5 and removed in Qt 6.
Please use QRegularExpression.
You can use the QRegularExpression Example to design and validate your regex.
-
Hi,
QRegExp has been deprecated in Qt 5 and removed in Qt 6.
Please use QRegularExpression.
You can use the QRegularExpression Example to design and validate your regex.
@SGaist Success !
This simple code "matches" first word (#Waiting.. ) in 'line" string , but LAST character.
Time to read more and add stuff...
ThanksQRegularExpression re("[ -~] "); QRegularExpressionMatch match = re.match(line); qDebug() <<"Regulsr expression ...test match " ; if (match.hasMatch()) { qDebug() <<"Regulsr expression ...match.hasMatch()" ; QString matched = match.captured(0); qDebug()<< "Matched " << matched; // ... } else { qDebug()<< "NO MATCH "; } ...
-
@SGaist Success !
This simple code "matches" first word (#Waiting.. ) in 'line" string , but LAST character.
Time to read more and add stuff...
ThanksQRegularExpression re("[ -~] "); QRegularExpressionMatch match = re.match(line); qDebug() <<"Regulsr expression ...test match " ; if (match.hasMatch()) { qDebug() <<"Regulsr expression ...match.hasMatch()" ; QString matched = match.captured(0); qDebug()<< "Matched " << matched; // ... } else { qDebug()<< "NO MATCH "; } ...
@AnneRanch said in QRegExp - does not return all printable characters.:
QRegularExpression re("[ -~] ");
This does not match a word.
It matches a space/hyphen/tilde followed by a space.[EDIT No, see next answer below.] Somatch.captured(0)
will return precisely 2 characters. [EDIT This is still true.]QRegExp rx("^\p[ -~]*$");
QRegExp rx("(\d+)(\s*)(cm|inch(es)?)");
Both of these --- and it would be same for
QRegularExpression
--- would require the\
s in the literal strings to be written as\\
for each one, as per C++ rules for literal string characters. -
@AnneRanch said in QRegExp - does not return all printable characters.:
QRegularExpression re("[ -~] ");
This does not match a word.
It matches a space/hyphen/tilde followed by a space.[EDIT No, see next answer below.] Somatch.captured(0)
will return precisely 2 characters. [EDIT This is still true.]QRegExp rx("^\p[ -~]*$");
QRegExp rx("(\d+)(\s*)(cm|inch(es)?)");
Both of these --- and it would be same for
QRegularExpression
--- would require the\
s in the literal strings to be written as\\
for each one, as per C++ rules for literal string characters.@JonB I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see". -
@JonB I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".@AnneRanch said in QRegExp - does not return all printable characters.:
I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ".
And I cannot argue with this! Let me be the first to admit when I have made mistake :) I read too quickly, and quite forgot that
-
(hyphen) is the one character with "special significance" when appearing inside the[...]
one-of-characters entity. There (and only there) it represents a "character range", from the character before it (space here) to the character after it (tilde here), which does indeed match all printable (ASCII) characters.So
re("[ -~] ")
does indeed match "any printable character followed by a space". That still does mean precisely 2 characters rather than a whole word, which would have to be the last character of a "word" followed by a space. However, it will also match 2 spaces, which is presumably not desired.There are several ways to pick out a "word" if that is what you want to do instead. The simplest would probably be (in your C++ code):
QRegularExpression re("\\w+");
but it all depends "which" word you might want in a line, what to do about the other characters in the line, etc.
-
@JonB I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".@AnneRanch said in QRegExp - does not return all printable characters.:
I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".With
QRegExp
, the dash symbol in[]
is used to define a range of characters (cf. Qt documentation https://doc.qt.io/qt-5/qregexp.html#sets-of-characters).I think the right way would be to use backslash as escape sequence:
RegularExpression re("[ \\-~] ");
-
@AnneRanch said in QRegExp - does not return all printable characters.:
I cannot argue this - the symbols inside [ ] were posted on another forum and advertised as " match all printable characters ". Obviously do not.
Unfortunately the match is "character space " - as you said. So I get last character of the first world and a space - which is little hard to actually "see".With
QRegExp
, the dash symbol in[]
is used to define a range of characters (cf. Qt documentation https://doc.qt.io/qt-5/qregexp.html#sets-of-characters).I think the right way would be to use backslash as escape sequence:
RegularExpression re("[ \\-~] ");
-
@JonB said in QRegExp - does not return all printable characters.:
I don't think that's right: I believe that would match a space or any character between \ and ~!
I have to admit that I don't have understand what is the goal of this regex.
My comprehension was that he want to match only on 3 characters space, bash and tilde.
But maybe I am wrong and the goal was to get any character between space and tilde (booth included) followed by a space.
So the regex should beRegularExpression re("[ -~] ");
But this also don't made sense to me. -
@JonB said in QRegExp - does not return all printable characters.:
I don't think that's right: I believe that would match a space or any character between \ and ~!
I have to admit that I don't have understand what is the goal of this regex.
My comprehension was that he want to match only on 3 characters space, bash and tilde.
But maybe I am wrong and the goal was to get any character between space and tilde (booth included) followed by a space.
So the regex should beRegularExpression re("[ -~] ");
But this also don't made sense to me.@KroMignon
Hi Kro. I had to delete my response you have quoted after I had written it! In my day, it used to be the case that to place a literal-
inside a[...]
character range you had to put it as the first or last character, so that it could not be interpreted as a "range" operator, which is what I was writing up. However, it seems that since those dim, dark UNIX System V.0 days you can now escape a literal-
via\-
so you can have it anywhere in the characters, just as you wrote. Hence I deleted my response to your response!But maybe I am wrong and the goal was to get any character between space and tilde (booth included) followed by a space.
That is exactly what the OP must have got from the source she took it from. However as pointed out that would include space-followed-by-space, which is unlikely to be desirable/robust. And at best it would have picked out the last character of a word followed by a space, which still seems an "unusual" thing to want. I have since suggested
\w+
as probably the most canonical way of picking out "a word". -
I hope this may help this discussion.
I am positing the "raw" file - all three lines.
I have added a silly loop to see if I can mach more than FIRST word / character in each line .Stream file Stream file "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m# \u001B[0;94m[bluetooth]\u001B[0m# Agent registered" Regulsr expression ...test match Regulsr expression ...match.hasMatch() Matched "Waiting" Matched "" Matched "" Matched "" Matched "" linea: "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m# \u001B[0;94m[bluetooth]\u001B[0m# Agent registered" Stream file Stream file "\u001B[0;94m[bluetooth]\u001B[0m# " Regulsr expression ...test match Regulsr expression ...match.hasMatch() Matched "0" Matched "" Matched "" Matched "" Matched "" linea: "\u001B[0;94m[bluetooth]\u001B[0m# " Overload QString File bt_utility_library.cpp function ProcessCommand @ line 162
Again this is code under construction and posted in hope to help.
I have used
QRegularExpression re("\w+"); //matches first word / characterAlso to clarify
at this point of learning using QRegulaExpression I just what to extract ALL words from the FIRST line.I'll tackle deleting the control characters AKA match only printable characters later.
-
Guess what I found - nice interactive tool
-
I hope this may help this discussion.
I am positing the "raw" file - all three lines.
I have added a silly loop to see if I can mach more than FIRST word / character in each line .Stream file Stream file "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m# \u001B[0;94m[bluetooth]\u001B[0m# Agent registered" Regulsr expression ...test match Regulsr expression ...match.hasMatch() Matched "Waiting" Matched "" Matched "" Matched "" Matched "" linea: "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m# \u001B[0;94m[bluetooth]\u001B[0m# Agent registered" Stream file Stream file "\u001B[0;94m[bluetooth]\u001B[0m# " Regulsr expression ...test match Regulsr expression ...match.hasMatch() Matched "0" Matched "" Matched "" Matched "" Matched "" linea: "\u001B[0;94m[bluetooth]\u001B[0m# " Overload QString File bt_utility_library.cpp function ProcessCommand @ line 162
Again this is code under construction and posted in hope to help.
I have used
QRegularExpression re("\w+"); //matches first word / characterAlso to clarify
at this point of learning using QRegulaExpression I just what to extract ALL words from the FIRST line.I'll tackle deleting the control characters AKA match only printable characters later.
@AnneRanch said in QRegExp - does not return all printable characters.:
QRegularExpression re("\w+"); //matches first word / character
Exactly as I wrote in response to your code earlier, if this is what you have written in your C++ source code it will not work.
I explained about the
\
in C/C++ source code and said you would need to have:QRegularExpression re("\\w+");
Is that what you have?
at this point of learning using QRegulaExpression I just what to extract ALL words from the FIRST line.
If you want to match/extract multiple matches you must change to calling
QRegularExpression::globalmatch()
. This is described in https://doc.qt.io/qt-5/qregularexpression.html#global-matching, and the example there is exactly what you are asking for:Global matching is useful to find all the occurrences of a given regular expression inside a subject string. Suppose that we want to extract all the words from a given string, where a word is a substring matching the pattern
\w+
.QRegularExpression re("(\\w+)"); QRegularExpressionMatchIterator i = re.globalMatch("the quick fox"); QStringList words; while (i.hasNext()) { QRegularExpressionMatch match = i.next(); QString word = match.captured(1); words << word; } // words contains "the", "quick", "fox"
-
Update / conclusion (??)
The attached TEST code works as expected using the ORIGINAL "range" pattern. Using "global match " solved it. It matches EACH individual character in the string - not the entire word.
Note there are NO "escape characters" in the pattern.
It also "prints" control characters - which is NOT desirable - so I will work on that.
It is probably slow, but it does not matter in my app.#ifdef BYPASS terminal outoput bluetoothctl Agent registered [bluetooth]# string to analyze / match linea: "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m# \u001B[0;94m[bluetooth]\u001B[0m# Agent registered" linea: "\u001B[0;94m[bluetooth]\u001B[0m# " #endif // QRegularExpression re("(\\w+)"); works OK QRegularExpression re("([ -z])"); original range QString pattern = re.pattern(); // pattern == "a third pattern" qDebug()<<"Pattern function " << pattern; TRACE_TextEdit->addItem("TRACE pattern "); TRACE_TextEdit->addItem(pattern); QRegularExpressionMatchIterator i = re.globalMatch(line); QString text = "Match "; QStringList words; while (i.hasNext()) { QRegularExpressionMatch match = i.next(); QString word = match.captured(1); text += word; TRACE_TextEdit->addItem(text ); words << word; qDebug()<<" Global match (words) " << words; }
-
Update / conclusion (??)
The attached TEST code works as expected using the ORIGINAL "range" pattern. Using "global match " solved it. It matches EACH individual character in the string - not the entire word.
Note there are NO "escape characters" in the pattern.
It also "prints" control characters - which is NOT desirable - so I will work on that.
It is probably slow, but it does not matter in my app.#ifdef BYPASS terminal outoput bluetoothctl Agent registered [bluetooth]# string to analyze / match linea: "Waiting to connect to bluetoothd...\u001B[0;94m[bluetooth]\u001B[0m# \u001B[0;94m[bluetooth]\u001B[0m# Agent registered" linea: "\u001B[0;94m[bluetooth]\u001B[0m# " #endif // QRegularExpression re("(\\w+)"); works OK QRegularExpression re("([ -z])"); original range QString pattern = re.pattern(); // pattern == "a third pattern" qDebug()<<"Pattern function " << pattern; TRACE_TextEdit->addItem("TRACE pattern "); TRACE_TextEdit->addItem(pattern); QRegularExpressionMatchIterator i = re.globalMatch(line); QString text = "Match "; QStringList words; while (i.hasNext()) { QRegularExpressionMatch match = i.next(); QString word = match.captured(1); text += word; TRACE_TextEdit->addItem(text ); words << word; qDebug()<<" Global match (words) " << words; }
@AnneRanch said in QRegExp - does not return all printable characters.:
It matches EACH individual character in the string - not the entire word.
That is because you have chosen to use
re("([ -z])")
instead of there("(\\w+)")
suggested. It is the+
symbol which causes match whole word/multiple characters instead of per character. Did you really want to match each individual character separately? Up to you.