UnicodeDecodeError with output from Windows OS command
-
@6thC , and @ anyone else
OK, in light of the @6thC's post it seems this may be a Python/PyQt issue as to how it relates to Qt. But I need you C++ peeps to confirm how you would be doing the coding so that I know how to take this further. So please read on.
- I am using
QByteArray QProcess::readAllStandardOutput()
to read the output from the spawned program. - I wish to output those bytes to a
QTextEdit
, so usingQTextEdit::setText(const QString &text)
. - This means I need to convert a
QByteArray
to aQString
. - How would you do this from C++ ??
The problem is that from PyQt/Python, we don't have the types/methods of
QByteArray
orQString
(!) Instead we seem to have to do conversions via the native Python typesbytes
&str
respectively. This comes from the question https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-string I asked on the forum and the solution at https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-string/12 given by a visiting PyQt expert. This is where my PyQt code lineoutput.data().decode('utf-8')
comes from, which I'm now understanding is the cause of my problem.As I have said above, in the past I have never had to do anything about "decoding" output from a program to display it. For example, in a native C++ Windows program I simply grab the output from the sub-process and send it to a native Windows text/edit control (whatever that is), all the while simply treating the output in both places as a
char []
(or maybeunsigned char []
), without ever "decoding converting", and I have never had a problem with any output on any platform. - I am using
-
@jsulm
No, I'm not. You tell me: what's the output returned from an OS command under Windows? See the thread of mine https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-string [BTW, this particular OS command --- not that it should matter --- isrobocopy
under a purely UK Windows. I really would not expect anything "funny" to be going on here, character-wise. And the solution required is a generic one, to work with any arbitrary command (assuming it basically outputs "text", I'm not interested if it were genuinely to return arbitrary binary bytes, that won't happen).]The whole point to my way of thinking is, as I have said above, I have written code under, say, Windows many times in the past to grab & display output from a sub-process and have never had to know/guess/call anything to do with encoding, so I don't see why I should have to here.
As I wrote and am asking, I have always simply accepted the raw bytes from the command output and shoved them into an edit/text control for display to the user and have never had a problem.
I don't know what's going with Qt's
QByteArray
and need to turn it intoQString
as I need to with the Qt calls and/or the PyQt/Python issue. See the final paragraph in my previous post. I don't want to know about encoding, I don't want to do any decoding, and I don't see why I should have to so I would never get such an error as I'm stuck on now? -
@jsulm
Hmm, I really don't think so? That would mean 2 bytes per character, is that right?? If under Windows you just goecho hello > file
, and then inspect the file, it's 1 byte per character? I did say, I have never claimed to understand this encoding and UTF-8/16 stuff.... -
@SGaist
OK, when you & @jsulm say something it's usually right. We must be talking about different things. What do you two mean/understand by:As far as I know Windows uses 16 bit Unicode (UTF-16).
Windows uses utf-16 for what? I have been talking about the output from running a command, which is what I'm trying to read in, or the contents of a file if you like. And that certainly is not 16-bit representation for characters, so are you talking about something else?
[EDIT: OK, I knew you guys knew your stuff... I'm reading https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows to understand what this is all about...]
Look, please, guys. I asked earlier:
1.I am using
QByteArray QProcess::readAllStandardOutput()
to read the output from the spawned program.
2.I wish to output those bytes to aQTextEdit
, so usingQTextEdit::setText(const QString &text)
.
3.This means I need to convert aQByteArray
to aQString
.
4.How would you do this from C++ ??Could you just tell me what tiny piece of code you would use to convert the
QByteArray
to aQString
in the above circumstance? -
In C++, you would just make:
myTextEdit->setText(myProcess->readAllStandardOutput());
-
@SGaist
Wow! That's what I've waiting to hear for ages.Then that's all I'm trying to do from Python/PyQt, isn't it?! But you can't, I thought it was the very first thing I tried, and it complained that
myProcess->readAllStandardOutput()
returnsQByteArray
whilemyTextEdit->setText()
only acceptsQString
? How doesvoid QTextEdit::setText(const QString &text)
acceptQByteArray QProcess::readAllStandardOutput()
in your C++?I said I never wanted to have to write any "decoding" code if I didn't need to. What you've written is just what I would always have loved. But I thought this is where the PyQt behaviour of not letting us use
QByteArray
orQString
directly means we have to go through native Pythonbytes
&str
conversions, and then thedecode()
is required. That was the whole point of my related post at https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-stringI'm so confused by all this, it really shouldn't be this hard to know how to write code to copy the output of an arbitrary OS command into a text control to show to the user, but from Python/PyQt....
-
Because of the QString constructor taking a QByteArray as parameter.
Did you try with
textEdit.setText(u'{}'.format(ba.data()))
? -
@SGaist
OK, we're getting somewhere!Because of the QString constructor taking a QByteArray as parameter.
Yeah, that makes sense. As I understand it, Python
str
does not accept abytes
implicitly, you have to specify adecode()
with a named encoding, which is exactly what I've been struggling with in all these posts. :([This may all be to do with https://riverbankcomputing.com/pipermail/pyqt/2010-January/025564.html]
Did you try with textEdit.setText(u'{}'.format(ba.data())) ?
No, 'coz nobody has suggested anything like that! In the other thread a visiting PyQt expert said to use
decode('utf-8')
, and that's where I've been getting the exception with the£
character from. I shall certainly try your suggestion tomorrow.Thank you so much for your time!
-
@SGaist said in UnicodeDecodeError with output from Windows OS command:
In C++, you would just make:
myTextEdit->setText(myProcess->readAllStandardOutput());
For the record, a PyQt expert is telling me this will suffer from the same problem (though it may not report it explicitly), in that without being told an encoding it will use utf-8, and it won't know what to do with that
£
character. Do you know what this would actually show when the output contains a£
/byte value of 0x9c ? -
[This post cross-posted to https://forum.qt.io/topic/85064/qbytearray-to-string/27 ]
For the record, I have done exhaustive investigation, and there is only one solution which "correctly" displays the
£
character under Windows. I am exhausted so will keep this brief:-
To create a file name with a
£
in it: Go into, say, Notepad and use its Save to name a file likeabc£.txt
. This is in the UK, using a UK keyboard and a standard UK-configured Windows. -
Note that at this point if you view the filename in either Explorer or, say, via
dir
you do see a£
, not some other character. That's what my user will want to see in the output of the command he will run. -
Run an OS command like
robocopy
or evendir
, which will include the filename in its output. -
Read the output with
QProcess.readAllStandardOutput()
. I'm saying the£
character will arrive as a single byte of value 0x9c. -
For the required Python/PyQt decoding
bytes->str
(QByteArray->QString
) line, the only thing which works (does not raise an exception) AND represents the character as a£
is:ba.bytes().decode("cp850")
.
That is the "Code Page 850", used in UK/Western Europe (so I'm told). It is the result output of you open a Command Prompt and execute just
chcp
.Any other decoding either raises
UnicodeDecodeError
(e.g. ifutf-8
) or decodes but represents it with another character (e.g. ifwindows_1252
orcp1252
).I still haven't found a way of getting that
cp850
encoding name programatically from anywhere --- if you ask Python for, say, the "system encoding" or "user's preferred encoding" you get thecp1252
--- so I've had to hard-code it. [EDIT: If you want it, it'sctypes.cdll.kernel32.GetConsoleOutputCP()
.]So there you are. I don't have C++ as opposed to Python for Qt, but I have a suspicion that if anyone tries it using the straight C++ Qt way of
text = QString(process.readAllStandardOutput())
they'll find they do not actually get to see the£
symbol.... -