Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. [Solved] RegExp word boundary metacharacter \b causing problems.
QtWS25 Last Chance

[Solved] RegExp word boundary metacharacter \b causing problems.

Scheduled Pinned Locked Moved General and Desktop
3 Posts 2 Posters 4.6k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P Offline
    P Offline
    paucoma
    wrote on last edited by
    #1

    Hi Everyone!

    I am trying to partially parse a richTextFile. Specifically I would like to identify the bold control word "\b".

    A Control word in the RTF format has the following format (with a few exceptions): "MSDN Article":http://msdn.microsoft.com/en-us/library/aa140284

    <LetterSequence><Delimiter>

    <LetterSequence>: -lowercase characters (a-z) although not completly respected by MS
    <Delimiter> : a single space, a nonalphabetic or nonnumeric character

    This is my code:

    @void rtf_strip(const QString &rtf)
    {
    QSet<int> boldOnSet;
    int k = 0;

    boldOnSet.clear();
    QRegExp re_boldOn("\b[^a-zA-Z0-9]",
    Qt::CaseSensitive, QRegExp::RegExp);
    while (k !=-1){
    k = rtf.indexOf(re_boldOn, k+1);
    qDebug() << re_boldOn.cap();
    boldOnSet.insert(k);
    }
    qDebug() << boldOnSet.size();
    @
    I have tried the following Regular Expressions to find the occurances of the control word with no luck

    "\b[^a-zA-Z0-9]" <-- in notepad++ this search runs well but in Qt I'm having some difficulties.
    "\\b[^a-zA-Z0-9]"

    putting brackets around the b or the slashes would make them get escaped.

    My output is basically matching a single character of [^a-zA-Z0-9]

    I have also tried using QRegExp::RegExp2 as syntaxPattern

    Any Ideas?

    Note: this is my first post on the forum so I will still need to get acquainted with the formatting goodies

    1 Reply Last reply
    0
    • A Offline
      A Offline
      andre
      wrote on last edited by
      #2

      Well, remember that strings in C++ also need escaping. So, if you take the regexp string that works in NotePad++, and escape that exact string for use in C++, then you should have something that works. And yes, that means that if you want to use a literal \ in a regexp, you need 4(!) of them in your C++ string. So much for readability ;-)

      1 Reply Last reply
      0
      • P Offline
        P Offline
        paucoma
        wrote on last edited by
        #3

        Thanks for the reply. And yup, you are right. I was generating the anchor \b in regex because I didn't properly escape for C++.

        Searching for "\b"
        --> Regex Engine needs escaping --> "\b"
        --> C++ Needs escaping --> "\\b"

        I won't forget to try extra escapes next time.

        Thanks!

        1 Reply Last reply
        0

        • Login

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • Users
        • Groups
        • Search
        • Get Qt Extensions
        • Unsolved