[Solved] RegExp word boundary metacharacter \b causing problems.
-
Hi Everyone!
I am trying to partially parse a richTextFile. Specifically I would like to identify the bold control word "\b".
A Control word in the RTF format has the following format (with a few exceptions): "MSDN Article":http://msdn.microsoft.com/en-us/library/aa140284
<LetterSequence><Delimiter>
<LetterSequence>: -lowercase characters (a-z) although not completly respected by MS
<Delimiter> : a single space, a nonalphabetic or nonnumeric characterThis is my code:
@void rtf_strip(const QString &rtf)
{
QSet<int> boldOnSet;
int k = 0;boldOnSet.clear();
QRegExp re_boldOn("\b[^a-zA-Z0-9]",
Qt::CaseSensitive, QRegExp::RegExp);
while (k !=-1){
k = rtf.indexOf(re_boldOn, k+1);
qDebug() << re_boldOn.cap();
boldOnSet.insert(k);
}
qDebug() << boldOnSet.size();
@
I have tried the following Regular Expressions to find the occurances of the control word with no luck"\b[^a-zA-Z0-9]" <-- in notepad++ this search runs well but in Qt I'm having some difficulties.
"\\b[^a-zA-Z0-9]"putting brackets around the b or the slashes would make them get escaped.
My output is basically matching a single character of [^a-zA-Z0-9]
I have also tried using QRegExp::RegExp2 as syntaxPattern
Any Ideas?
Note: this is my first post on the forum so I will still need to get acquainted with the formatting goodies
-
Well, remember that strings in C++ also need escaping. So, if you take the regexp string that works in NotePad++, and escape that exact string for use in C++, then you should have something that works. And yes, that means that if you want to use a literal \ in a regexp, you need 4(!) of them in your C++ string. So much for readability ;-)
-
Thanks for the reply. And yup, you are right. I was generating the anchor \b in regex because I didn't properly escape for C++.
Searching for "\b"
--> Regex Engine needs escaping --> "\b"
--> C++ Needs escaping --> "\\b"I won't forget to try extra escapes next time.
Thanks!