Unsolved Can QChar functions work with QStrings ?
-
But I'm still not sure if it'll work. I've read the docs but it says that isLetter() works only for UCS-4 encoded characters but the character of a QString corresponds to one UTF-16 code unit. In that case, I'm not sure if my code would eventually give rise to random behaviour.
-
@jefazo92 said in Can QChar functions work with QStrings ?:
works only for UCS-4 encoded characters
You read the wrong documentation - you're using QChar::isLetter(), not the static QChar::isLetter(uint).
-
I've read the docs but it says that isLetter() works only for UCS-4 encoded characters
And that is only correct for this overload, which you are not using. You are using this.
But: You still didn't tell us what you really want to do. I read your last code, as if you want to remove all non-letter chars from the string.
Example:
"%$Hello/World!["
>"HelloWorld"
Is that correct? -
So what should I do in that case ? I'm a bit lost here.
-
Yes exactly. That's what I want.
-
@jefazo92 Then here's a one-liner:
#include <QDebug> #include <QRegularExpression> int main(int argc, char *argv[]) { QString s = "%$Hello/World!["; s.remove(QRegularExpression("[^\\p{L}]")); qDebug() << "s =" << s; }
Output:
s = "HelloWorld"
-
@aha_1980 said in Can QChar functions work with QStrings ?:
QRegularExpression
Would that then remove any punctuation marks, numbers and blank spaces ? Or would I have to add something else ?
-
Also does that mean that my line above would then not work as it should ?
-
Your code should work as well, I'd just clean it up a bit:
for (int i = 0; i < aword.size(); i++) { if (!aword[i].isLetter()) aword.remove(i--, 1); }
However, why re-invent the wheel if there is a ready-made function?
-
@aha_1980 said in Can QChar functions work with QStrings ?:
QRegularExpression
Hi @aha_1980,
Thank you a lot for your reply. If it's not too much asking how did you find QRegularExpression ? Like I'm ctrl+F at the QStrings doc but I get 53 coincidences I don't know which of them all is the one that suits me for this example. I'm trying to understand what the [^\p{L}] in the QRegularExpression("[^\p{L}]") means. Also how did you know that the QChar functions I used for QString would work ? Since I'm still a rooky, all these details are important to help me fend off by myself in the future. Thanks.
-
You can find the documentation here: QRegularExpression
-
@Christian-Ehrlicher
To be fair to @jefazo92, that docs page does not begin to attempt to explain[^\p{L}]
, or even the\p{L}
. In fact it only refers the reader to other sources, such as http://pcre.org/pcre.txt, for all syntax.I like reg exes, and have used them for years (decades), but it's only right to say to the OP that they are a bit tricky and he'll have to go outside Qt docs to find out about them. A resource I would recommend is an on-line constructor/tester like https://regex101.com/ or https://regexr.com/, though they are not very tutorially.
-
@jefazo92 said in Can QChar functions work with QStrings ?:
Thank you a lot for your reply. If it's not too much asking how did you find QRegularExpression ? Like I'm ctrl+F at the QStrings doc but I get 53 coincidences I don't know which of them all is the one that suits me for this example.
I'd recomment another way: You know that you want to remove something from your string, so you search for remove first. That gives you some overloads:
QString & remove(int position, int n) QString & remove(QChar ch, Qt::CaseSensitivity cs = Qt::CaseSensitive) QString & remove(QLatin1String str, Qt::CaseSensitivity cs = Qt::CaseSensitive) QString & remove(const QString &str, Qt::CaseSensitivity cs = Qt::CaseSensitive) QString & remove(const QRegExp &rx) QString & remove(const QRegularExpression &re)
The first one simply removes n chars starting at position. The second one removes a specific char from the string. The third and fourth remove a substring. And the last two operate on regular expressions (with QRegExp beeing deprecated).
If you now know that regular expressions are used to describe text patterns, you already have the correct remove overload. Then I googled for
qregularexpression non-letter
which brought me to https://stackoverflow.com/questions/38001256/handling-accented-letters-in-qregularexpressions-in-qt5The rest is experience.
I'm trying to understand what the [^\p{L}] in the QRegularExpression("[^\p{L}]") means.
This pattern describes everything that is not (
[^]
) an unicode letter (\\p{L}
). You already got links to regex pages from my mates.Regards
-
@aha_1980
It's coming across the\p{L}
construct for Unicode letter that's hard/intimidating. One does not come across it in many examples, and it's a later addition to reg exes. -
It bothers me that this ([^\p{L}]) regular expression would work in one engine and fail in another, at least according to the website that was linked:
https://regex101.com/
I tried it in the PHP one and it worked, but the Python and Ecmascript ones failed.Somehow I had it in my head that regexs were standardized for the most part. That apparently is a bad assumption.
-
@fcarney You need a regex engine that supports Unicode (it's a shame that 2019 not all engines do that!), according to: https://www.regular-expressions.info/unicode.html
Regards
-
@fcarney
A fair amount (but not exclusively) of my reg ex would have been in Perl. And that was before Perl had Unicode support, so at least in my old days Perl would indeed not have supported\p{L}
. It's good that regex101 seems to have mentioned this. Of course the OP only cares about Qt now. But it's like I said: that construct is a bit of a "big one" for a beginner question!