Regular expressio to match "word"...
-
This is embarrassing and frustrating same time.
I cannot find the thread I posted while back and missing the "Regular expression" test tool also.
Anyway - the attached code retrieves capital "w" which happen to be a first ASCII character in the "result" source.
I recall something about C requires "escape character '...
I am trying to "match" all words in source.result = SubRegularExpressionExt("[/\w/]", result,1);
-
This is embarrassing and frustrating same time.
I cannot find the thread I posted while back and missing the "Regular expression" test tool also.
Anyway - the attached code retrieves capital "w" which happen to be a first ASCII character in the "result" source.
I recall something about C requires "escape character '...
I am trying to "match" all words in source.result = SubRegularExpressionExt("[/\w/]", result,1);
-
@AnneRanch
\w
in a regular expression matches a (single) "word" character.
\w+
would match multiple word characters, e.g. a whole word.
Of course if you want to put a literal\
into a C string you must double it, like"\\w+"
.A picture is worth thousands words
and the result is - single "M" was "matched".
The expression is defined as "[\w+]" and passed as such to the function.
Where is the problem ?
-
The problem lies in the definition of your pattern: no need for
[
and]
, and why do you have.+
?
If you want to match a single word, just useQString checkMatch( "\\w" );
orQString checkMatch( R"x(\w)x" );
. What you wrote is actually matching each character of single words. You might want try to validate your regex first, e.g. using an online validator like this one (by the way, the default regex engine in C++ is ECMAScript if I'm not mistaken). -
A picture is worth thousands words
and the result is - single "M" was "matched".
The expression is defined as "[\w+]" and passed as such to the function.
Where is the problem ?
@AnneRanch said in Regular expressio to match "word"...:
Where is the problem ?
I said to use
\w+
but you chose to use[\w.+]
or[(\w)+]
or[\w+]
. (with appropriate doublings of\
in a C string).\w
matches a single "word character".+
requires one or more of these (consecutively), i.e. a whole word. That's my\w+
.But as soon as you use
[...]
in a regular expression that means "any one of the characters inside the brackets". So your[\w.+]
means: any single word character or a dot or a plus sign, just one of any of these.@JohanSolo said in Regular expressio to match "word"...:
If you want to match a single word, just use
QString checkMatch( "\\w" );
I do not agree with this. That will match a single word-character. A single whole word will require
QString checkMatch( "\\w+" );
.You might want try to validate your regex first, e.g. using an online validator like this one (by the way, the default regex engine in C++ is ECMAScript if I'm not mistaken).
As @JohanSolo says, you might want to play with reg exs at https://regex101.com/ while you develop them. Actually Qt does not use the "ECMAScript" variant, it uses "PCRE". Just leave the FLAVOR shown on the left-hand side of that page at its default value, which is PCRE2 (PHP >= 7.3). If copying something which works there back to Qt for a C string, don't forget to double any
\
characters to\\
. -
@AnneRanch said in Regular expressio to match "word"...:
Where is the problem ?
I said to use
\w+
but you chose to use[\w.+]
or[(\w)+]
or[\w+]
. (with appropriate doublings of\
in a C string).\w
matches a single "word character".+
requires one or more of these (consecutively), i.e. a whole word. That's my\w+
.But as soon as you use
[...]
in a regular expression that means "any one of the characters inside the brackets". So your[\w.+]
means: any single word character or a dot or a plus sign, just one of any of these.@JohanSolo said in Regular expressio to match "word"...:
If you want to match a single word, just use
QString checkMatch( "\\w" );
I do not agree with this. That will match a single word-character. A single whole word will require
QString checkMatch( "\\w+" );
.You might want try to validate your regex first, e.g. using an online validator like this one (by the way, the default regex engine in C++ is ECMAScript if I'm not mistaken).
As @JohanSolo says, you might want to play with reg exs at https://regex101.com/ while you develop them. Actually Qt does not use the "ECMAScript" variant, it uses "PCRE". Just leave the FLAVOR shown on the left-hand side of that page at its default value, which is PCRE2 (PHP >= 7.3). If copying something which works there back to Qt for a C string, don't forget to double any
\
characters to\\
.@JohanSolo said in Regular expressio to match "word"...:
If you want to match a single word, just use
QString checkMatch( "\\w" );
I do not agree with this. That will match a single word-character. A single whole word will require
QString checkMatch( "\\w+" );
.Yes, my bad. Thanks your pointing this out.
-
OK, looks like I went "full circle".
Without going thru few "this is how you do it " and NOT really explaining WHY is it done that way
such as the initial "w+" .
**I was under erroneous believe that "w" means "word " and not a single character .The real "word" is "w+".
I am not a fool enough to expect this forum to reach me how to interpret "[" and "(" in expression - I can RTFM, if I can find one which explains concepts and not just "this is how is this done ..."
CASE SOLVED
-
OK, looks like I went "full circle".
Without going thru few "this is how you do it " and NOT really explaining WHY is it done that way
such as the initial "w+" .
**I was under erroneous believe that "w" means "word " and not a single character .The real "word" is "w+".
I am not a fool enough to expect this forum to reach me how to interpret "[" and "(" in expression - I can RTFM, if I can find one which explains concepts and not just "this is how is this done ..."
CASE SOLVED
@AnneRanch said in Regular expressio to match "word"...:
**I was under erroneous believe that "w" means "word " and not a single character .
@JonB said in Regular expressio to match "word"...:
@AnneRanch
\w
in a regular expression matches a (single) "word" character.
\w+
would match multiple word characters, e.g. a whole word.