Is there an easy way to escape all non-ASCII characters of a QString?
-
@l3u_ said:
I hope they are better now?
I'm afraid your encoding is ambiguous. Lets say I have a string
<SOH>27
, where<SOH>
is the ASCII character 1. It's gonna be encoded as=127
and then decoded as<FF>7
, where<FF>
is the ASCII character 12. You can invent a better encoding, e.g. you can add a separator instead of fixing the number size to 2, but keep in mind that you are still reinventing a very old wheel.If Base64 is too big for you maybe look into some existing lossless encodings instead. The percent encoding @Kent-Dorfman mentioned might be an option. Qt already supports it through QUrl::toPercentEncoding().
-
@Chris-Kawa I don't get it?
<SOH>27
with<SOH>
being1
is just127
and will stay127
?! It walks through theQByteArray
byte per byte and checks if the byte represents an ASCII character in the defined range, and if not, it replaces it with=
and the string representation of the hex value of that byte (which itself is ASCII again)? And if a=
appears in the array, it means that the next two bytes represent the hex value of the byte in question (including the=
itself)?I mean, I didn't make this up, it's just Quoted-Printable – at least I hope so?! So I'm not re-inventing an old wheel, I'm just trying to implement it …
Well,
QUrl::toPercentEncoding
would possibly be an option, but it escapes spaces when it doesn't have to … one could replace%20
with a space before though and re-replace it again before decoding it … -
@l3u_ said in Is there an easy way to escape all non-ASCII characters of a QString?:
<SOH>27
with<SOH>
being1
is just127
and will stay127
?Umm, no. I haven't followed the ins & outs of this discussion, so I may be mistaken about your context. But
<SOH>
is ASCII/binary character with value1
, not digit1
. But the27
are digits27
(right or not?), which is different. The 3 character sequence<SOH>27
is not the same as the 3 characters127
. -
@l3u_
<SOH>
is 1 as in binary 00000001, not as in text "1". It's a non printable character. Your range is 9,32-60,62-126, so 1 is below it and gets encoded as=1
. The following27
is text "27". The characters are in range, so don't get translated, so you get=127
. When decoding you don't know where the encoded part ends, just assume two digit number, so you grab 2 as part of the encoded character, when really it's just text, so you decode=12
followed by text "7" instead of decoding=1
followed by text "27".Can this actually be typed in using a QLineEdit?!
Haven't tried, but if not typed then probably copy/pasted from somewhere.
-
@l3u_
As I say, I have not followed the discussion. But, no, user will not be able to type the<SOH>
character into a line edit. That would actually require Ctrl+A to be typed, and a line edit won't store that as a character, it will treat it as a control sequence (probably selecting the whole of the line edit contents if your press it). -
@Chris-Kawa Okay. Here we go. I hoped using quoted-printable would be easy to implement … maybe, using
QUrl::toPercentEncoding
adding some charaters that actually don't need to be escaped for this use-case (using theexclude
bytearray) will have less pitfalls ;-) -
Okay, this seems to do the trick:
const auto test = QStringLiteral("abc, äöü!"); const auto exclude = QStringLiteral(" !\"#$&'()*+,/:;=?@[]").toUtf8(); const auto escaped = QUrl::toPercentEncoding(test, exclude); const auto unEscaped = QUrl::fromPercentEncoding(escaped); qDebug() << test; qDebug() << escaped; qDebug() << unEscaped;
Output:
"abc, äöü!" "abc, %C3%A4%C3%B6%C3%BC!" "abc, äöü!"
This seems to be what I inteded to achieve with the quoted-printable encoding, and just as long (as
%C3
is not longer than=C3
). And as one can exclude characters that don't need escaping for my use-case, it's essentially the same, but without programming shortcomings from me ;-)I think this is the correct way. Thanks for the input :-)
Edit: There's also
QByteArray::toPercentEncoding
, which is actually called byQUrl::toPercentEncoding
, with only the inputQString
being converted to aQByteArray
viaQString::toUtf8
. No need to useQUrl
. -