Qt Forum

    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Unsolved

    Update: Forum Guidelines & Code of Conduct

    [Solved] RegExp word boundary metacharacter \b causing problems.

    General and Desktop
    2
    3
    4314
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      paucoma last edited by

      Hi Everyone!

      I am trying to partially parse a richTextFile. Specifically I would like to identify the bold control word "\b".

      A Control word in the RTF format has the following format (with a few exceptions): "MSDN Article":http://msdn.microsoft.com/en-us/library/aa140284

      <LetterSequence><Delimiter>

      <LetterSequence>: -lowercase characters (a-z) although not completly respected by MS
      <Delimiter> : a single space, a nonalphabetic or nonnumeric character

      This is my code:

      @void rtf_strip(const QString &rtf)
      {
      QSet<int> boldOnSet;
      int k = 0;

      boldOnSet.clear();
      QRegExp re_boldOn("\b[^a-zA-Z0-9]",
      Qt::CaseSensitive, QRegExp::RegExp);
      while (k !=-1){
      k = rtf.indexOf(re_boldOn, k+1);
      qDebug() << re_boldOn.cap();
      boldOnSet.insert(k);
      }
      qDebug() << boldOnSet.size();
      @
      I have tried the following Regular Expressions to find the occurances of the control word with no luck

      "\b[^a-zA-Z0-9]" <-- in notepad++ this search runs well but in Qt I'm having some difficulties.
      "\\b[^a-zA-Z0-9]"

      putting brackets around the b or the slashes would make them get escaped.

      My output is basically matching a single character of [^a-zA-Z0-9]

      I have also tried using QRegExp::RegExp2 as syntaxPattern

      Any Ideas?

      Note: this is my first post on the forum so I will still need to get acquainted with the formatting goodies

      1 Reply Last reply Reply Quote 0
      • A
        andre last edited by

        Well, remember that strings in C++ also need escaping. So, if you take the regexp string that works in NotePad++, and escape that exact string for use in C++, then you should have something that works. And yes, that means that if you want to use a literal \ in a regexp, you need 4(!) of them in your C++ string. So much for readability ;-)

        1 Reply Last reply Reply Quote 0
        • P
          paucoma last edited by

          Thanks for the reply. And yup, you are right. I was generating the anchor \b in regex because I didn't properly escape for C++.

          Searching for "\b"
          --> Regex Engine needs escaping --> "\b"
          --> C++ Needs escaping --> "\\b"

          I won't forget to try extra escapes next time.

          Thanks!

          1 Reply Last reply Reply Quote 0
          • First post
            Last post