Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Qt's MOC and UTF-8
Forum Updated to NodeBB v4.3 + New Features

Qt's MOC and UTF-8

Scheduled Pinned Locked Moved Solved General and Desktop
14 Posts 6 Posters 1.1k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • MasterQM Offline
    MasterQM Offline
    MasterQ
    wrote on last edited by MasterQ
    #1

    Hello,

    I tried to use identifiers with UTF-8 encoded characters like German Umlauts (äöüß).

    I get the following message from MOC while processing an enum:

    AutoMoc subprocess error
    
    The moc process failed to compile
      "SRC:/src/BankBusiness/AccountView.h"
    
    ...
    
    Output
    src/BankBusiness/AccountView.h:51:1: error: Parse error at "Gl"
    
    ninja: build stopped: subcommand failed.
    

    next character after 'Gl' is 'ä'.

    int äeiöü = 0; // <= accepted
    
    enum CSVField {
        GläubigerId, // <= MOC stopps here
    }
    

    This issue only happens (so far) for a definition of enum entries, not for regular identifiers.

    I am wondering if Qt would have issues with UTF-8 in general.

    Why is an Umlaut not accepted in an enum but for other identifiers? My source code encoding is fully UTF-8.

    1 Reply Last reply
    0
    • MasterQM MasterQ

      @Christian-Ehrlicher said in Qt's MOC and UTF-8:

      Don't use anything but ascii for variables or similar.

      That's an advice, not an explanation. ;-)

      Would you label the issue I mentioned as a feature or a bug? Compilers should not have issues with non-ascii identifiers these days.

      I am only wondering why the behaviour is different for the two examples. Non-ascii should be accepted or not. But not this mixture!

      S Offline
      S Offline
      SimonSchroeder
      wrote on last edited by
      #4

      @MasterQ said in Qt's MOC and UTF-8:

      Compilers should not have issues with non-ascii identifiers these days.

      Unicode in identifiers is still a fairly recent thing. And moc is not a real compiler. It just tries to parse the important bits of information. You cannot fully expect it to have every new functionality of the standard. It has been mentioned several times that Qt is closely following the development of reflection in C++. C++26 will do its first steps towards reflection, but it will not be enough for Qt, yet. Hopefully very soon Qt will ditch the moc (maybe C++29 will get all the features we need) and switch over to reflection. It does not make a lot of sense to make moc a lot more useable when a true solution is just around the corner. So, in the future you might be able to use umlauts everywhere.

      I'm also not fully sure which Unicode characters are actually included (https://en.cppreference.com/w/cpp/language/identifiers). There is a mention of XID_Start and XID_Continue. There might be a difference for ä as a single code point and ¨ + a as two combining code points. I personally try to stick to English identifier names because you never know where your application might end up in the future. Your company might grow and become international. German identifiers will then make it hard to understand for foreign developers. And you'll never run into the problem you've mentioned...

      Pl45m4P 1 Reply Last reply
      4
      • Christian EhrlicherC Offline
        Christian EhrlicherC Offline
        Christian Ehrlicher
        Lifetime Qt Champion
        wrote on last edited by
        #2

        Don't use anything but ascii for variables or similar.

        Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
        Visit the Qt Academy at https://academy.qt.io/catalog

        1 Reply Last reply
        4
        • MasterQM Offline
          MasterQM Offline
          MasterQ
          wrote on last edited by
          #3

          @Christian-Ehrlicher said in Qt's MOC and UTF-8:

          Don't use anything but ascii for variables or similar.

          That's an advice, not an explanation. ;-)

          Would you label the issue I mentioned as a feature or a bug? Compilers should not have issues with non-ascii identifiers these days.

          I am only wondering why the behaviour is different for the two examples. Non-ascii should be accepted or not. But not this mixture!

          S 1 Reply Last reply
          0
          • MasterQM MasterQ

            @Christian-Ehrlicher said in Qt's MOC and UTF-8:

            Don't use anything but ascii for variables or similar.

            That's an advice, not an explanation. ;-)

            Would you label the issue I mentioned as a feature or a bug? Compilers should not have issues with non-ascii identifiers these days.

            I am only wondering why the behaviour is different for the two examples. Non-ascii should be accepted or not. But not this mixture!

            S Offline
            S Offline
            SimonSchroeder
            wrote on last edited by
            #4

            @MasterQ said in Qt's MOC and UTF-8:

            Compilers should not have issues with non-ascii identifiers these days.

            Unicode in identifiers is still a fairly recent thing. And moc is not a real compiler. It just tries to parse the important bits of information. You cannot fully expect it to have every new functionality of the standard. It has been mentioned several times that Qt is closely following the development of reflection in C++. C++26 will do its first steps towards reflection, but it will not be enough for Qt, yet. Hopefully very soon Qt will ditch the moc (maybe C++29 will get all the features we need) and switch over to reflection. It does not make a lot of sense to make moc a lot more useable when a true solution is just around the corner. So, in the future you might be able to use umlauts everywhere.

            I'm also not fully sure which Unicode characters are actually included (https://en.cppreference.com/w/cpp/language/identifiers). There is a mention of XID_Start and XID_Continue. There might be a difference for ä as a single code point and ¨ + a as two combining code points. I personally try to stick to English identifier names because you never know where your application might end up in the future. Your company might grow and become international. German identifiers will then make it hard to understand for foreign developers. And you'll never run into the problem you've mentioned...

            Pl45m4P 1 Reply Last reply
            4
            • MasterQM Offline
              MasterQM Offline
              MasterQ
              wrote on last edited by
              #5

              Thank you for the info

              1 Reply Last reply
              0
              • MasterQM MasterQ has marked this topic as solved on
              • S SimonSchroeder

                @MasterQ said in Qt's MOC and UTF-8:

                Compilers should not have issues with non-ascii identifiers these days.

                Unicode in identifiers is still a fairly recent thing. And moc is not a real compiler. It just tries to parse the important bits of information. You cannot fully expect it to have every new functionality of the standard. It has been mentioned several times that Qt is closely following the development of reflection in C++. C++26 will do its first steps towards reflection, but it will not be enough for Qt, yet. Hopefully very soon Qt will ditch the moc (maybe C++29 will get all the features we need) and switch over to reflection. It does not make a lot of sense to make moc a lot more useable when a true solution is just around the corner. So, in the future you might be able to use umlauts everywhere.

                I'm also not fully sure which Unicode characters are actually included (https://en.cppreference.com/w/cpp/language/identifiers). There is a mention of XID_Start and XID_Continue. There might be a difference for ä as a single code point and ¨ + a as two combining code points. I personally try to stick to English identifier names because you never know where your application might end up in the future. Your company might grow and become international. German identifiers will then make it hard to understand for foreign developers. And you'll never run into the problem you've mentioned...

                Pl45m4P Offline
                Pl45m4P Offline
                Pl45m4
                wrote on last edited by
                #6

                @SimonSchroeder said in Qt's MOC and UTF-8:

                Unicode in identifiers is still a fairly recent thing

                I cannot wait to debug foreign code where every variable and symbol is completely in Chinese/Japanese/Korean/Hebrew letters... :D

                Even though it might be convenient to some, but when it comes to such things, you can overengineer and worsen things quickly.


                If debugging is the process of removing software bugs, then programming must be the process of putting them in.

                ~E. W. Dijkstra

                MasterQM 1 Reply Last reply
                1
                • Pl45m4P Pl45m4

                  @SimonSchroeder said in Qt's MOC and UTF-8:

                  Unicode in identifiers is still a fairly recent thing

                  I cannot wait to debug foreign code where every variable and symbol is completely in Chinese/Japanese/Korean/Hebrew letters... :D

                  Even though it might be convenient to some, but when it comes to such things, you can overengineer and worsen things quickly.

                  MasterQM Offline
                  MasterQM Offline
                  MasterQ
                  wrote on last edited by
                  #7

                  @Pl45m4 said in Qt's MOC and UTF-8:

                  Even though it might be convenient to some, but when it comes to such things, you can overengineer and worsen things quickly.

                  This depends of the point of view. Ask non english speaking Chinese, Japanese, Korean, or Hebrew readers.

                  Why to exclude young Chinese guys from coding? ...

                  But I got your points and my question was not about "makes it sense" but more about "is it possible to do so, if you wish"

                  Cheers

                  Pl45m4P jsulmJ 2 Replies Last reply
                  0
                  • MasterQM MasterQ

                    @Pl45m4 said in Qt's MOC and UTF-8:

                    Even though it might be convenient to some, but when it comes to such things, you can overengineer and worsen things quickly.

                    This depends of the point of view. Ask non english speaking Chinese, Japanese, Korean, or Hebrew readers.

                    Why to exclude young Chinese guys from coding? ...

                    But I got your points and my question was not about "makes it sense" but more about "is it possible to do so, if you wish"

                    Cheers

                    Pl45m4P Offline
                    Pl45m4P Offline
                    Pl45m4
                    wrote on last edited by Pl45m4
                    #8

                    @MasterQ said in Qt's MOC and UTF-8:

                    This depends of the point of view. Ask non english speaking Chinese, Japanese, Korean, or Hebrew readers.
                    Why to exclude young Chinese guys from coding?

                    That was not intended to go in your direction :)

                    Over the years code conventions have developed, for good reasons.
                    As a German myself, I would never post code like (now I did here, LOL):

                    (I actually googled for words with Ä and ß... everything that came to my mind would not have made any sense)

                    void ÄußereKlasse::holeÖffentlichesMaß(int maß)
                    {
                        std::string ßÄäÖÄöÜ = "Hello";
                        std::string scheiße = "World"; // classic :D
                        // ...
                    }
                    

                    and then ask for assistance in any case other than code syntax. When there is nothing obviously wrong in C++ standard terms, it's a pain to figure out what is going on if you can't read sh*t...

                    It's like reverse engineering obfuscated code, except it's actually clear text, but you still need to figure out the hard way what this is all about...
                    If there's even the slightest chance that somebody else other than yourself will ever read your code or you even ask for help over the Internet... you should stick to those standards.

                    Just because you probably can (in the future), doesn't mean you should spam language specific characters from now on :))


                    If debugging is the process of removing software bugs, then programming must be the process of putting them in.

                    ~E. W. Dijkstra

                    1 Reply Last reply
                    0
                    • MasterQM Offline
                      MasterQM Offline
                      MasterQ
                      wrote on last edited by
                      #9

                      I agree, no doubt.

                      But I can remember some FORTRAN code, maybe 30 years ago, where all variables were like x1, x2, ... I only had a chance to understand because I knew what the coder was intended to calculate, =8-0.

                      Even if 'x' is an ASCII character, the code was terribly unreadable. But that's another chapter of the lore.

                      TGIF

                      have a nice weekend

                      S 1 Reply Last reply
                      0
                      • MasterQM MasterQ

                        I agree, no doubt.

                        But I can remember some FORTRAN code, maybe 30 years ago, where all variables were like x1, x2, ... I only had a chance to understand because I knew what the coder was intended to calculate, =8-0.

                        Even if 'x' is an ASCII character, the code was terribly unreadable. But that's another chapter of the lore.

                        TGIF

                        have a nice weekend

                        S Offline
                        S Offline
                        SimonSchroeder
                        wrote on last edited by
                        #10

                        @MasterQ said in Qt's MOC and UTF-8:

                        But I can remember some FORTRAN code, maybe 30 years ago, where all variables were like x1, x2

                        Well, short variable names back then had a couple of reasons. For one, memory was at a premium and shorter identifiers means less memory. This is further compounded if you consider punch cards with only up to 80 columns (and you had to start your code at column 7). I still have to work with some old 80 column FORTRAN code. It is annoying when you need to split an equation over several lines. Shorter names help you to fit everything into one line. Not to forget that identifiers are restricted to 8 characters. There is only so many meaningful identifiers with only 8 characters. And the first letter (initially) would define if your variable is integer or floating point. (This is why still to this day the most common loop variables are i,j,k,l,m,n as those where defined to be integers.)

                        @MasterQ said in Qt's MOC and UTF-8:

                        Why to exclude young Chinese guys from coding? ...

                        Everyone coding in (proper) C++ has to code in English. Keywords are English. So, you can either stick to English or mix languages, but you cannot write entirely in a language different from English. You could try with macros, but it is certainly not a good solution. Further, it would still restrict yourself to languages written left to right. I would claim that any programmer needs to know English in order to stay up to date. So, let's just agree to English as the common language to make code portable between different nations.

                        1 Reply Last reply
                        0
                        • MasterQM MasterQ

                          @Pl45m4 said in Qt's MOC and UTF-8:

                          Even though it might be convenient to some, but when it comes to such things, you can overengineer and worsen things quickly.

                          This depends of the point of view. Ask non english speaking Chinese, Japanese, Korean, or Hebrew readers.

                          Why to exclude young Chinese guys from coding? ...

                          But I got your points and my question was not about "makes it sense" but more about "is it possible to do so, if you wish"

                          Cheers

                          jsulmJ Offline
                          jsulmJ Offline
                          jsulm
                          Lifetime Qt Champion
                          wrote on last edited by
                          #11

                          @MasterQ said in Qt's MOC and UTF-8:

                          Why to exclude young Chinese guys from coding?

                          I'm quite confident young Chinese guys speak English well enough.
                          How should this work in a project where people from different countries are involved? If everyone involved in such a project starts to use his/her native language in code you can dump the project. In our company such code would never pass code review. It is not about excluding anybody, it is about having a common language everybody understands.

                          https://forum.qt.io/topic/113070/qt-code-of-conduct

                          JonBJ 1 Reply Last reply
                          0
                          • jsulmJ jsulm

                            @MasterQ said in Qt's MOC and UTF-8:

                            Why to exclude young Chinese guys from coding?

                            I'm quite confident young Chinese guys speak English well enough.
                            How should this work in a project where people from different countries are involved? If everyone involved in such a project starts to use his/her native language in code you can dump the project. In our company such code would never pass code review. It is not about excluding anybody, it is about having a common language everybody understands.

                            JonBJ Offline
                            JonBJ Offline
                            JonB
                            wrote on last edited by
                            #12

                            @jsulm So we should use Esperanto, which everyone understands, instead of English :)

                            jsulmJ Pl45m4P 2 Replies Last reply
                            1
                            • JonBJ JonB

                              @jsulm So we should use Esperanto, which everyone understands, instead of English :)

                              jsulmJ Offline
                              jsulmJ Offline
                              jsulm
                              Lifetime Qt Champion
                              wrote on last edited by
                              #13

                              @JonB I'm sure more people speak Latin than Esperanto :-D

                              https://forum.qt.io/topic/113070/qt-code-of-conduct

                              1 Reply Last reply
                              0
                              • JonBJ JonB

                                @jsulm So we should use Esperanto, which everyone understands, instead of English :)

                                Pl45m4P Offline
                                Pl45m4P Offline
                                Pl45m4
                                wrote on last edited by
                                #14

                                @JonB said in Qt's MOC and UTF-8:

                                So we should use Esperanto, which everyone understands, instead of English :)

                                Mi ŝatas tion :D

                                @jsulm said in Qt's MOC and UTF-8:

                                I'm quite confident young Chinese guys speak English well enough.

                                That's why we have an International category for every major language, right ;-)

                                Even though it's not that helpful, but I think there are a lot of "programmers" in every region of the world, speaking their native language only while their English "knowledge is limited to the few "keywords" for C++ (or whatever code they are using)


                                If debugging is the process of removing software bugs, then programming must be the process of putting them in.

                                ~E. W. Dijkstra

                                1 Reply Last reply
                                0

                                • Login

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • Users
                                • Groups
                                • Search
                                • Get Qt Extensions
                                • Unsolved