Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. UnicodeDecodeError with output from Windows OS command
QtWS25 Last Chance

UnicodeDecodeError with output from Windows OS command

Scheduled Pinned Locked Moved Solved General and Desktop
18 Posts 5 Posters 8.6k Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M mrjj
    29 Nov 2017, 21:59

    Hi
    Not even a windows in a virtual machine for testing?
    Anyway, try the /UNILOG option
    as talked about here
    https://stackoverflow.com/questions/29069401/how-can-i-get-unicode-characters-from-robocopy-process-standard-ouput-in-c-sharp

    What languages are those windoze machines running ?
    Been using robocopy for years and not noticed anything like that.
    (i assume its spaces in filenames + non english letters)

    J Offline
    J Offline
    JonB
    wrote on 29 Nov 2017, 23:09 last edited by
    #3

    @mrjj
    Thanks, and I will investigate what you suggest.

    However, there is something fishy going on here, and I don't feel this can be the right approach.

    The machine is standard UK, and I strongly doubt there will be any kind of non-English letter anywhere. At worst there might be a £ symbol somewhere.

    I have been writing "redirection" programs like this for years, under Windows, from C++, Perl etc. I've never had to specify an encoding/decoding for anything. I just grab stdout/stderr and plonk it into an output window and never had a problem.

    Any solution which actually requires passing some option to robocopy to affect its output seems quite wrong. The redirector --- simple running an OS command and showing its output in a scrolling window --- should never require any limitations on the commands it runs in practice. (And even if at worst the thing produces binary output, I'll accept any old characters in the window.)

    The whole Qt level of converting what it treats as a QByteArray into a string specifying some encoding in order to display in a window has never arisen for me elsewhere.

    Finally, even if this is necessary I cannot have a situation where the output might cause the whole parent to get an exception. That has never happened to me. At worst print some funny character, but continue with the output found, not report "can't decode byte 0x9c in position 32: invalid start byte". If I passed the output to, say, grep I wouldn't expect it to barf at the characters encountered.

    What's going on here? Thanks.

    1 Reply Last reply
    0
    • 6 Offline
      6 Offline
      6thC
      wrote on 29 Nov 2017, 23:23 last edited by
      #4

      @JNBarchan said in UnicodeDecodeError with output from Windows OS command:

      invalid start byte

      Well, I don't it's Qt's fault. Maybe windows and python though? It looks like a python call?
      I have not used Qt/python. Whether it helps or not - this is interesting:

      Googling for "invalid start byte" brings back everything python (not limited to Qt) and it seems to have happened a lot :)

      This may help your specific case:
      https://stackoverflow.com/questions/12468179/unicodedecodeerror-utf8-codec-cant-decode-byte-0x9c

      >>> '\x9c'.decode('cp1252')
      u'\u0153'
      >>> print '\x9c'.decode('cp1252')
      œ
      
      There is no more generic solution to "Guess the encoding roulette" – Puppy Feb 2 '15 at 10:23
      
      found it using a combination of web search, luck and intuition: cp1252 was used by default in the legacy components of Microsoft Windows in English and some other Western languages – bolov Nov 28 '15 at 21:58
      
      J 2 Replies Last reply 29 Nov 2017, 23:25
      3
      • 6 6thC
        29 Nov 2017, 23:23

        @JNBarchan said in UnicodeDecodeError with output from Windows OS command:

        invalid start byte

        Well, I don't it's Qt's fault. Maybe windows and python though? It looks like a python call?
        I have not used Qt/python. Whether it helps or not - this is interesting:

        Googling for "invalid start byte" brings back everything python (not limited to Qt) and it seems to have happened a lot :)

        This may help your specific case:
        https://stackoverflow.com/questions/12468179/unicodedecodeerror-utf8-codec-cant-decode-byte-0x9c

        >>> '\x9c'.decode('cp1252')
        u'\u0153'
        >>> print '\x9c'.decode('cp1252')
        œ
        
        There is no more generic solution to "Guess the encoding roulette" – Puppy Feb 2 '15 at 10:23
        
        found it using a combination of web search, luck and intuition: cp1252 was used by default in the legacy components of Microsoft Windows in English and some other Western languages – bolov Nov 28 '15 at 21:58
        
        J Offline
        J Offline
        JonB
        wrote on 29 Nov 2017, 23:25 last edited by
        #5

        @6thC
        Wow, that link looks really interesting, lots going on there. I shall investigate, thanks.

        Yes, I have never done stuff from Python before, maybe that's where the shenanigans are going on...!

        1 Reply Last reply
        0
        • 6 6thC
          29 Nov 2017, 23:23

          @JNBarchan said in UnicodeDecodeError with output from Windows OS command:

          invalid start byte

          Well, I don't it's Qt's fault. Maybe windows and python though? It looks like a python call?
          I have not used Qt/python. Whether it helps or not - this is interesting:

          Googling for "invalid start byte" brings back everything python (not limited to Qt) and it seems to have happened a lot :)

          This may help your specific case:
          https://stackoverflow.com/questions/12468179/unicodedecodeerror-utf8-codec-cant-decode-byte-0x9c

          >>> '\x9c'.decode('cp1252')
          u'\u0153'
          >>> print '\x9c'.decode('cp1252')
          œ
          
          There is no more generic solution to "Guess the encoding roulette" – Puppy Feb 2 '15 at 10:23
          
          found it using a combination of web search, luck and intuition: cp1252 was used by default in the legacy components of Microsoft Windows in English and some other Western languages – bolov Nov 28 '15 at 21:58
          
          J Offline
          J Offline
          JonB
          wrote on 30 Nov 2017, 09:38 last edited by JonB
          #6

          @6thC , and @ anyone else

          OK, in light of the @6thC's post it seems this may be a Python/PyQt issue as to how it relates to Qt. But I need you C++ peeps to confirm how you would be doing the coding so that I know how to take this further. So please read on.

          1. I am using QByteArray QProcess::readAllStandardOutput() to read the output from the spawned program.
          2. I wish to output those bytes to a QTextEdit, so using QTextEdit::setText(const QString &text).
          3. This means I need to convert a QByteArray to a QString.
          4. How would you do this from C++ ??

          The problem is that from PyQt/Python, we don't have the types/methods of QByteArray or QString(!) Instead we seem to have to do conversions via the native Python types bytes & str respectively. This comes from the question https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-string I asked on the forum and the solution at https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-string/12 given by a visiting PyQt expert. This is where my PyQt code line output.data().decode('utf-8') comes from, which I'm now understanding is the cause of my problem.

          As I have said above, in the past I have never had to do anything about "decoding" output from a program to display it. For example, in a native C++ Windows program I simply grab the output from the sub-process and send it to a native Windows text/edit control (whatever that is), all the while simply treating the output in both places as a char [] (or maybe unsigned char []), without ever "decoding converting", and I have never had a problem with any output on any platform.

          J 1 Reply Last reply 30 Nov 2017, 09:58
          0
          • J JonB
            30 Nov 2017, 09:38

            @6thC , and @ anyone else

            OK, in light of the @6thC's post it seems this may be a Python/PyQt issue as to how it relates to Qt. But I need you C++ peeps to confirm how you would be doing the coding so that I know how to take this further. So please read on.

            1. I am using QByteArray QProcess::readAllStandardOutput() to read the output from the spawned program.
            2. I wish to output those bytes to a QTextEdit, so using QTextEdit::setText(const QString &text).
            3. This means I need to convert a QByteArray to a QString.
            4. How would you do this from C++ ??

            The problem is that from PyQt/Python, we don't have the types/methods of QByteArray or QString(!) Instead we seem to have to do conversions via the native Python types bytes & str respectively. This comes from the question https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-string I asked on the forum and the solution at https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-string/12 given by a visiting PyQt expert. This is where my PyQt code line output.data().decode('utf-8') comes from, which I'm now understanding is the cause of my problem.

            As I have said above, in the past I have never had to do anything about "decoding" output from a program to display it. For example, in a native C++ Windows program I simply grab the output from the sub-process and send it to a native Windows text/edit control (whatever that is), all the while simply treating the output in both places as a char [] (or maybe unsigned char []), without ever "decoding converting", and I have never had a problem with any output on any platform.

            J Offline
            J Offline
            jsulm
            Lifetime Qt Champion
            wrote on 30 Nov 2017, 09:58 last edited by
            #7

            @JNBarchan Are you sure your data is utf-8?

            https://forum.qt.io/topic/113070/qt-code-of-conduct

            J 1 Reply Last reply 30 Nov 2017, 10:16
            0
            • J jsulm
              30 Nov 2017, 09:58

              @JNBarchan Are you sure your data is utf-8?

              J Offline
              J Offline
              JonB
              wrote on 30 Nov 2017, 10:16 last edited by JonB
              #8

              @jsulm
              No, I'm not. You tell me: what's the output returned from an OS command under Windows? See the thread of mine https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-string [BTW, this particular OS command --- not that it should matter --- is robocopy under a purely UK Windows. I really would not expect anything "funny" to be going on here, character-wise. And the solution required is a generic one, to work with any arbitrary command (assuming it basically outputs "text", I'm not interested if it were genuinely to return arbitrary binary bytes, that won't happen).]

              The whole point to my way of thinking is, as I have said above, I have written code under, say, Windows many times in the past to grab & display output from a sub-process and have never had to know/guess/call anything to do with encoding, so I don't see why I should have to here.

              As I wrote and am asking, I have always simply accepted the raw bytes from the command output and shoved them into an edit/text control for display to the user and have never had a problem.

              I don't know what's going with Qt's QByteArray and need to turn it into QString as I need to with the Qt calls and/or the PyQt/Python issue. See the final paragraph in my previous post. I don't want to know about encoding, I don't want to do any decoding, and I don't see why I should have to so I would never get such an error as I'm stuck on now?

              J 1 Reply Last reply 30 Nov 2017, 10:35
              0
              • J JonB
                30 Nov 2017, 10:16

                @jsulm
                No, I'm not. You tell me: what's the output returned from an OS command under Windows? See the thread of mine https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-string [BTW, this particular OS command --- not that it should matter --- is robocopy under a purely UK Windows. I really would not expect anything "funny" to be going on here, character-wise. And the solution required is a generic one, to work with any arbitrary command (assuming it basically outputs "text", I'm not interested if it were genuinely to return arbitrary binary bytes, that won't happen).]

                The whole point to my way of thinking is, as I have said above, I have written code under, say, Windows many times in the past to grab & display output from a sub-process and have never had to know/guess/call anything to do with encoding, so I don't see why I should have to here.

                As I wrote and am asking, I have always simply accepted the raw bytes from the command output and shoved them into an edit/text control for display to the user and have never had a problem.

                I don't know what's going with Qt's QByteArray and need to turn it into QString as I need to with the Qt calls and/or the PyQt/Python issue. See the final paragraph in my previous post. I don't want to know about encoding, I don't want to do any decoding, and I don't see why I should have to so I would never get such an error as I'm stuck on now?

                J Offline
                J Offline
                jsulm
                Lifetime Qt Champion
                wrote on 30 Nov 2017, 10:35 last edited by
                #9

                @JNBarchan As far as I know Windows uses 16 bit Unicode (UTF-16).

                https://forum.qt.io/topic/113070/qt-code-of-conduct

                J 1 Reply Last reply 30 Nov 2017, 10:44
                1
                • J jsulm
                  30 Nov 2017, 10:35

                  @JNBarchan As far as I know Windows uses 16 bit Unicode (UTF-16).

                  J Offline
                  J Offline
                  JonB
                  wrote on 30 Nov 2017, 10:44 last edited by
                  #10

                  @jsulm
                  Hmm, I really don't think so? That would mean 2 bytes per character, is that right?? If under Windows you just go echo hello > file, and then inspect the file, it's 1 byte per character? I did say, I have never claimed to understand this encoding and UTF-8/16 stuff....

                  1 Reply Last reply
                  0
                  • S Offline
                    S Offline
                    SGaist
                    Lifetime Qt Champion
                    wrote on 30 Nov 2017, 18:34 last edited by
                    #11

                    Hi,

                    AFAIK @jsulm is right.

                    Interested in AI ? www.idiap.ch
                    Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                    J 1 Reply Last reply 30 Nov 2017, 19:00
                    0
                    • S SGaist
                      30 Nov 2017, 18:34

                      Hi,

                      AFAIK @jsulm is right.

                      J Offline
                      J Offline
                      JonB
                      wrote on 30 Nov 2017, 19:00 last edited by JonB
                      #12

                      @SGaist
                      OK, when you & @jsulm say something it's usually right. We must be talking about different things. What do you two mean/understand by:

                      As far as I know Windows uses 16 bit Unicode (UTF-16).

                      Windows uses utf-16 for what? I have been talking about the output from running a command, which is what I'm trying to read in, or the contents of a file if you like. And that certainly is not 16-bit representation for characters, so are you talking about something else?

                      [EDIT: OK, I knew you guys knew your stuff... I'm reading https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows to understand what this is all about...]

                      Look, please, guys. I asked earlier:

                      1.I am using QByteArray QProcess::readAllStandardOutput() to read the output from the spawned program.
                      2.I wish to output those bytes to a QTextEdit, so using QTextEdit::setText(const QString &text).
                      3.This means I need to convert a QByteArray to a QString.
                      4.How would you do this from C++ ??

                      Could you just tell me what tiny piece of code you would use to convert the QByteArray to a QString in the above circumstance?

                      1 Reply Last reply
                      0
                      • S Offline
                        S Offline
                        SGaist
                        Lifetime Qt Champion
                        wrote on 30 Nov 2017, 20:30 last edited by
                        #13

                        In C++, you would just make: myTextEdit->setText(myProcess->readAllStandardOutput());

                        Interested in AI ? www.idiap.ch
                        Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                        J 2 Replies Last reply 30 Nov 2017, 21:38
                        0
                        • S SGaist
                          30 Nov 2017, 20:30

                          In C++, you would just make: myTextEdit->setText(myProcess->readAllStandardOutput());

                          J Offline
                          J Offline
                          JonB
                          wrote on 30 Nov 2017, 21:38 last edited by JonB
                          #14

                          @SGaist
                          Wow! That's what I've waiting to hear for ages.

                          Then that's all I'm trying to do from Python/PyQt, isn't it?! But you can't, I thought it was the very first thing I tried, and it complained that myProcess->readAllStandardOutput() returns QByteArray while myTextEdit->setText() only accepts QString? How does void QTextEdit::setText(const QString &text) accept QByteArray QProcess::readAllStandardOutput() in your C++?

                          I said I never wanted to have to write any "decoding" code if I didn't need to. What you've written is just what I would always have loved. But I thought this is where the PyQt behaviour of not letting us use QByteArray or QString directly means we have to go through native Python bytes & str conversions, and then the decode() is required. That was the whole point of my related post at https://forum.qt.io/topic/85064/python3-pyqt5-x-qbytearray-to-string

                          I'm so confused by all this, it really shouldn't be this hard to know how to write code to copy the output of an arbitrary OS command into a text control to show to the user, but from Python/PyQt....

                          1 Reply Last reply
                          0
                          • S Offline
                            S Offline
                            SGaist
                            Lifetime Qt Champion
                            wrote on 30 Nov 2017, 21:47 last edited by
                            #15

                            Because of the QString constructor taking a QByteArray as parameter.

                            Did you try with textEdit.setText(u'{}'.format(ba.data())) ?

                            Interested in AI ? www.idiap.ch
                            Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

                            J 1 Reply Last reply 30 Nov 2017, 21:55
                            0
                            • S SGaist
                              30 Nov 2017, 21:47

                              Because of the QString constructor taking a QByteArray as parameter.

                              Did you try with textEdit.setText(u'{}'.format(ba.data())) ?

                              J Offline
                              J Offline
                              JonB
                              wrote on 30 Nov 2017, 21:55 last edited by JonB
                              #16

                              @SGaist
                              OK, we're getting somewhere!

                              Because of the QString constructor taking a QByteArray as parameter.

                              Yeah, that makes sense. As I understand it, Python str does not accept a bytes implicitly, you have to specify a decode() with a named encoding, which is exactly what I've been struggling with in all these posts. :(

                              [This may all be to do with https://riverbankcomputing.com/pipermail/pyqt/2010-January/025564.html]

                              Did you try with textEdit.setText(u'{}'.format(ba.data())) ?

                              No, 'coz nobody has suggested anything like that! In the other thread a visiting PyQt expert said to use decode('utf-8'), and that's where I've been getting the exception with the £ character from. I shall certainly try your suggestion tomorrow.

                              Thank you so much for your time!

                              1 Reply Last reply
                              0
                              • S SGaist
                                30 Nov 2017, 20:30

                                In C++, you would just make: myTextEdit->setText(myProcess->readAllStandardOutput());

                                J Offline
                                J Offline
                                JonB
                                wrote on 30 Nov 2017, 22:32 last edited by JonB
                                #17

                                @SGaist said in UnicodeDecodeError with output from Windows OS command:

                                In C++, you would just make: myTextEdit->setText(myProcess->readAllStandardOutput());

                                For the record, a PyQt expert is telling me this will suffer from the same problem (though it may not report it explicitly), in that without being told an encoding it will use utf-8, and it won't know what to do with that £ character. Do you know what this would actually show when the output contains a £/byte value of 0x9c ?

                                1 Reply Last reply
                                0
                                • J Offline
                                  J Offline
                                  JonB
                                  wrote on 1 Dec 2017, 15:31 last edited by JonB 12 Apr 2017, 08:35
                                  #18

                                  [This post cross-posted to https://forum.qt.io/topic/85064/qbytearray-to-string/27 ]

                                  For the record, I have done exhaustive investigation, and there is only one solution which "correctly" displays the £ character under Windows. I am exhausted so will keep this brief:

                                  1. To create a file name with a £ in it: Go into, say, Notepad and use its Save to name a file like abc£.txt. This is in the UK, using a UK keyboard and a standard UK-configured Windows.

                                  2. Note that at this point if you view the filename in either Explorer or, say, via dir you do see a £, not some other character. That's what my user will want to see in the output of the command he will run.

                                  3. Run an OS command like robocopy or even dir, which will include the filename in its output.

                                  4. Read the output with QProcess.readAllStandardOutput(). I'm saying the £ character will arrive as a single byte of value 0x9c.

                                  5. For the required Python/PyQt decoding bytes->str (QByteArray->QString) line, the only thing which works (does not raise an exception) AND represents the character as a £ is: ba.bytes().decode("cp850").

                                  That is the "Code Page 850", used in UK/Western Europe (so I'm told). It is the result output of you open a Command Prompt and execute just chcp.

                                  Any other decoding either raises UnicodeDecodeError (e.g. if utf-8) or decodes but represents it with another character (e.g. if windows_1252 or cp1252).

                                  I still haven't found a way of getting that cp850 encoding name programatically from anywhere --- if you ask Python for, say, the "system encoding" or "user's preferred encoding" you get the cp1252 --- so I've had to hard-code it. [EDIT: If you want it, it's ctypes.cdll.kernel32.GetConsoleOutputCP().]

                                  So there you are. I don't have C++ as opposed to Python for Qt, but I have a suspicion that if anyone tries it using the straight C++ Qt way of text = QString(process.readAllStandardOutput()) they'll find they do not actually get to see the £ symbol....

                                  1 Reply Last reply
                                  0

                                  12/18

                                  30 Nov 2017, 19:00

                                  • Login

                                  • Login or register to search.
                                  12 out of 18
                                  • First post
                                    12/18
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • Users
                                  • Groups
                                  • Search
                                  • Get Qt Extensions
                                  • Unsolved