[Solved] RegExp word boundary metacharacter \b causing problems.

  • Hi Everyone!

    I am trying to partially parse a richTextFile. Specifically I would like to identify the bold control word "\b".

    A Control word in the RTF format has the following format (with a few exceptions): "MSDN Article":


    <LetterSequence>: -lowercase characters (a-z) although not completly respected by MS
    <Delimiter> : a single space, a nonalphabetic or nonnumeric character

    This is my code:

    @void rtf_strip(const QString &rtf)
    QSet<int> boldOnSet;
    int k = 0;

    QRegExp re_boldOn("\b[^a-zA-Z0-9]",
    Qt::CaseSensitive, QRegExp::RegExp);
    while (k !=-1){
    k = rtf.indexOf(re_boldOn, k+1);
    qDebug() << re_boldOn.cap();
    qDebug() << boldOnSet.size();
    I have tried the following Regular Expressions to find the occurances of the control word with no luck

    "\b[^a-zA-Z0-9]" <-- in notepad++ this search runs well but in Qt I'm having some difficulties.

    putting brackets around the b or the slashes would make them get escaped.

    My output is basically matching a single character of [^a-zA-Z0-9]

    I have also tried using QRegExp::RegExp2 as syntaxPattern

    Any Ideas?

    Note: this is my first post on the forum so I will still need to get acquainted with the formatting goodies

  • Well, remember that strings in C++ also need escaping. So, if you take the regexp string that works in NotePad++, and escape that exact string for use in C++, then you should have something that works. And yes, that means that if you want to use a literal \ in a regexp, you need 4(!) of them in your C++ string. So much for readability ;-)

  • Thanks for the reply. And yup, you are right. I was generating the anchor \b in regex because I didn't properly escape for C++.

    Searching for "\b"
    --> Regex Engine needs escaping --> "\b"
    --> C++ Needs escaping --> "\\b"

    I won't forget to try extra escapes next time.


