Problem with regular expression

baysmith

'#' is non-word character and therefore matches it as a word boundary in 'C#'. It matches 'C#' without the word boundaries.

@
QRegExp exp("([a-gA-G][#bd]?)")
@

Why are you using word boundaries?

willypuzzle

I need to parse web page to find musical chords and '#' is usually used to denote sharp (or diesis) note.
for example C# is a common sign I can found.

willypuzzle

My exp has to catch A# or C# only if they are "isolated" from the rest of the content page.

giesbert

so you could define whitespaces before / after:

@
QRegExp exp("(\s[a-gA-G][#bd]?\s)")
@

willypuzzle

Yes but so RegExp engine catches the spaces too, and in my application this is not good,

giesbert

Come on, be a bit creative:

@
QRegExp exp("\s([a-gA-G][#bd]?)\s")
@

moving the breakets would do it, right?

willypuzzle

Sorry it continues to catch the spaces at the boundary of expression.

goetz

You don't want the matched text, but the captions. Have a look at "QRegExp::cap() ":http://doc.qt.nokia.com/latest/qregexp.html#cap and the sample usage in the "Capturing Text":http://doc.qt.nokia.com/latest/qregexp.html#capturing-text seciton of the docs:

@
QRegExp exp("\s([a-gA-G][#bd]?)\s");
QString test("You like the chord C# very well!");
int pos = 0;
while((pos = exp.indexIn(test, pos)) != -1) {
qDebug() << "found '" + exp.cap(1) + "'";
pos += exp.matchedLength();
}
@

willypuzzle

I'm using QRegExp with QTextDocument::find function and QTextCursor,

goetz

I doubt that this will be possible with regular expressions (you can non-match the word boundary with a "positive lookahead assertion":http://doc.qt.nokia.com/latest/qregexp.html#assertions, but unfortunately there is no similar "look back" assertion. There is an jira issue open with a suggestion for this ("QTBUGS-2371":http://bugreports.qt.nokia.com/browse/QTBUG-2371), you can vote for it.