Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. QString encoding to std::string
Forum Updated to NodeBB v4.3 + New Features

QString encoding to std::string

Scheduled Pinned Locked Moved Unsolved General and Desktop
16 Posts 5 Posters 3.6k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    SimonSchroeder
    wrote on last edited by
    #7

    Here is what we do and it has been working without any problems over a year now:

    We use the approach with setlocale(LC_ALL, ".UTF8") at the beginning of main(). You need to wrap it inside an #ifdef for Windows because it can crash on other systems (I think it was on macOS, but not on Linux). Now, you are in a world where on all systems you can open files using UTF-8 filenames.

    For a while we had some strange problems with converting QString to std::string. Here is the way we do it now:

    QString qstr("...something...");
    std::string str = qstr.toUtf8().data();
    // and the other way around
    qstr = QString::fromUtf8(str.c_str());
    

    This has worked reliably for us.

    Though, there is technically a slight caveat concerning temporaries. To be on the safe side, you should save the result from toUtf8() to a temporary QByteArray and use data() on this variable. In very few cases we got our app to crash because of this. I think the C++ standard states something like this about temporaries.

    JonBJ R 2 Replies Last reply
    2
    • S SimonSchroeder

      Here is what we do and it has been working without any problems over a year now:

      We use the approach with setlocale(LC_ALL, ".UTF8") at the beginning of main(). You need to wrap it inside an #ifdef for Windows because it can crash on other systems (I think it was on macOS, but not on Linux). Now, you are in a world where on all systems you can open files using UTF-8 filenames.

      For a while we had some strange problems with converting QString to std::string. Here is the way we do it now:

      QString qstr("...something...");
      std::string str = qstr.toUtf8().data();
      // and the other way around
      qstr = QString::fromUtf8(str.c_str());
      

      This has worked reliably for us.

      Though, there is technically a slight caveat concerning temporaries. To be on the safe side, you should save the result from toUtf8() to a temporary QByteArray and use data() on this variable. In very few cases we got our app to crash because of this. I think the C++ standard states something like this about temporaries.

      JonBJ Online
      JonBJ Online
      JonB
      wrote on last edited by
      #8

      @SimonSchroeder said in QString encoding to std::string:

      We use the approach with setlocale(LC_ALL, ".UTF8") at the beginning of main(). You need to wrap it inside an #ifdef for Windows

      I suggested this above to the OP, but they rejected it on the basis of

      but unfortunately the suggestion from reddit won't be very useful because the system also has to run under Linux

      ! :)

      1 Reply Last reply
      0
      • S SimonSchroeder

        Here is what we do and it has been working without any problems over a year now:

        We use the approach with setlocale(LC_ALL, ".UTF8") at the beginning of main(). You need to wrap it inside an #ifdef for Windows because it can crash on other systems (I think it was on macOS, but not on Linux). Now, you are in a world where on all systems you can open files using UTF-8 filenames.

        For a while we had some strange problems with converting QString to std::string. Here is the way we do it now:

        QString qstr("...something...");
        std::string str = qstr.toUtf8().data();
        // and the other way around
        qstr = QString::fromUtf8(str.c_str());
        

        This has worked reliably for us.

        Though, there is technically a slight caveat concerning temporaries. To be on the safe side, you should save the result from toUtf8() to a temporary QByteArray and use data() on this variable. In very few cases we got our app to crash because of this. I think the C++ standard states something like this about temporaries.

        R Offline
        R Offline
        Robert Hairgrove
        wrote on last edited by
        #9

        @SimonSchroeder The OP needs a std::string for a 3rd-party library. If the file access is done by the library, and not in his or her own code, it will probably not work on Windows if there are Unicode characters in the file name.

        The #ifdef switching should be enough to ensure that it will continue to work normally under Linux, if it was working before:

        #ifdef QT_OS_WINDOWS
        setLocale(LC_ALL, ".UTF8");
        #endif
        
        S 1 Reply Last reply
        1
        • R Robert Hairgrove

          @Markus1990 I have had the same problem before. On Linux, the utf-8 encoding should work OK. But on Windows, utf-16 is the native encoding.

          This page might help:
          https://newbedev.com/how-to-open-an-std-fstream-ofstream-or-ifstream-with-a-unicode-filename

          Also, there is QFile::encodeName() and QFile::decodeName() which might help.

          Christian EhrlicherC Online
          Christian EhrlicherC Online
          Christian Ehrlicher
          Lifetime Qt Champion
          wrote on last edited by
          #10

          @Robert-Hairgrove said in QString encoding to std::string:

          But on Windows, utf-16 is the native encoding.

          This is wrong.

          Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
          Visit the Qt Academy at https://academy.qt.io/catalog

          R 1 Reply Last reply
          0
          • Christian EhrlicherC Christian Ehrlicher

            @Robert-Hairgrove said in QString encoding to std::string:

            But on Windows, utf-16 is the native encoding.

            This is wrong.

            R Offline
            R Offline
            Robert Hairgrove
            wrote on last edited by
            #11

            @Christian-Ehrlicher said in QString encoding to std::string:

            @Robert-Hairgrove said in QString encoding to std::string:

            But on Windows, utf-16 is the native encoding.

            This is wrong.

            Well, not completely wrong, anyway. This is a quote directly from the "horse's mouth", as it were:

            "UTF-16 is basically the de facto standard encoding used by Windows Unicode-enabled APIs. UTF-16 is the “native” Unicode encoding in many other software systems, as well. For example, Qt, Java and the International Components for Unicode (ICU) library, just to name a few, use UTF-16 encoding to store Unicode strings."

            Source: https://learn.microsoft.com/en-us/archive/msdn-magazine/2016/september/c-unicode-encoding-conversions-with-stl-strings-and-win32-apis

            By the way, thanks for your extremely helpful reply!

            Christian EhrlicherC 1 Reply Last reply
            0
            • R Robert Hairgrove

              @Christian-Ehrlicher said in QString encoding to std::string:

              @Robert-Hairgrove said in QString encoding to std::string:

              But on Windows, utf-16 is the native encoding.

              This is wrong.

              Well, not completely wrong, anyway. This is a quote directly from the "horse's mouth", as it were:

              "UTF-16 is basically the de facto standard encoding used by Windows Unicode-enabled APIs. UTF-16 is the “native” Unicode encoding in many other software systems, as well. For example, Qt, Java and the International Components for Unicode (ICU) library, just to name a few, use UTF-16 encoding to store Unicode strings."

              Source: https://learn.microsoft.com/en-us/archive/msdn-magazine/2016/september/c-unicode-encoding-conversions-with-stl-strings-and-win32-apis

              By the way, thanks for your extremely helpful reply!

              Christian EhrlicherC Online
              Christian EhrlicherC Online
              Christian Ehrlicher
              Lifetime Qt Champion
              wrote on last edited by
              #12

              @Robert-Hairgrove It just tells you that all WinAPI calls ending with 'W' are taking an utf-16 string but that was the case already 20 years ago. All non-native stuff which most c(++) programmers use take a std::string or char* and this is encoded in the current locale. Even the console output is printed in ythe current locale instead utf-8 (which is the default since maybe one year, don't even know if it is the default when you just do an upgrade) and therefore all the questions why qDebug() and std::cout don't print the correct characters on a windows console.

              Qt Online Installer direct download: https://download.qt.io/official_releases/online_installers/
              Visit the Qt Academy at https://academy.qt.io/catalog

              1 Reply Last reply
              0
              • R Robert Hairgrove

                @SimonSchroeder The OP needs a std::string for a 3rd-party library. If the file access is done by the library, and not in his or her own code, it will probably not work on Windows if there are Unicode characters in the file name.

                The #ifdef switching should be enough to ensure that it will continue to work normally under Linux, if it was working before:

                #ifdef QT_OS_WINDOWS
                setLocale(LC_ALL, ".UTF8");
                #endif
                
                S Offline
                S Offline
                SimonSchroeder
                wrote on last edited by
                #13

                @Robert-Hairgrove said in QString encoding to std::string:

                The OP needs a std::string for a 3rd-party library. If the file access is done by the library, and not in his or her own code, it will probably not work on Windows if there are Unicode characters in the file name.

                So, then it depends on the library. If it uses fstream internally, this approach will work, as setlocale() will change it for everything. If the library uses Windows' file API directly I would hope that there is a wide string interface as well because otherwise you are out of luck (I guess...).

                @Christian-Ehrlicher Using the setlocale trick also switches the output via std::cout to UTF-8. You only have to make sure that your editor saves your source code files as UTF-8. And you have to tell the Microsoft compiler that the source is UTF-8 and that it should produce UTF-8 binaries (IIRC two separate switches).

                Little caveat: The setlocale trick does not fully solve command line arguments. Suddenly, we had problem with certain file names/paths. Here is the work around that we are using for this to truly be fully inside UTF-8:

                #ifdef _WIN32
                int main(int argc_discard, char** argv_discard)
                #else
                int main(int argc, char** argv)
                #endif
                {
                #ifdef _WIN32
                    setlocale(LC_CTYPE, ".utf8");
                
                    int argc;
                    wchar_t** wargv = CommandLineToArgvW(GetCommandLineW(), &argc);
                    char** argv;
                    argv = malloc(argc * sizeof(char*));
                    for (int i = 0; i < argc; ++i)
                    {
                        const int size = WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, NULL, 0, NULL, NULL);    // get length of string
                        argv[i] = malloc(size + 1);
                        WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, argv[i], size + 1, NULL, NULL);           // convert to UTF-8
                    }
                    LocalFree(wargv);
                #endif
                   ...
                }
                
                R JonBJ 2 Replies Last reply
                0
                • S SimonSchroeder

                  @Robert-Hairgrove said in QString encoding to std::string:

                  The OP needs a std::string for a 3rd-party library. If the file access is done by the library, and not in his or her own code, it will probably not work on Windows if there are Unicode characters in the file name.

                  So, then it depends on the library. If it uses fstream internally, this approach will work, as setlocale() will change it for everything. If the library uses Windows' file API directly I would hope that there is a wide string interface as well because otherwise you are out of luck (I guess...).

                  @Christian-Ehrlicher Using the setlocale trick also switches the output via std::cout to UTF-8. You only have to make sure that your editor saves your source code files as UTF-8. And you have to tell the Microsoft compiler that the source is UTF-8 and that it should produce UTF-8 binaries (IIRC two separate switches).

                  Little caveat: The setlocale trick does not fully solve command line arguments. Suddenly, we had problem with certain file names/paths. Here is the work around that we are using for this to truly be fully inside UTF-8:

                  #ifdef _WIN32
                  int main(int argc_discard, char** argv_discard)
                  #else
                  int main(int argc, char** argv)
                  #endif
                  {
                  #ifdef _WIN32
                      setlocale(LC_CTYPE, ".utf8");
                  
                      int argc;
                      wchar_t** wargv = CommandLineToArgvW(GetCommandLineW(), &argc);
                      char** argv;
                      argv = malloc(argc * sizeof(char*));
                      for (int i = 0; i < argc; ++i)
                      {
                          const int size = WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, NULL, 0, NULL, NULL);    // get length of string
                          argv[i] = malloc(size + 1);
                          WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, argv[i], size + 1, NULL, NULL);           // convert to UTF-8
                      }
                      LocalFree(wargv);
                  #endif
                     ...
                  }
                  
                  R Offline
                  R Offline
                  Robert Hairgrove
                  wrote on last edited by
                  #14

                  @SimonSchroeder said in QString encoding to std::string:

                  Little caveat: The setlocale trick does not fully solve command line arguments.

                  You are aware of QCoreApplication::arguments(), I hope?

                  S 1 Reply Last reply
                  0
                  • S SimonSchroeder

                    @Robert-Hairgrove said in QString encoding to std::string:

                    The OP needs a std::string for a 3rd-party library. If the file access is done by the library, and not in his or her own code, it will probably not work on Windows if there are Unicode characters in the file name.

                    So, then it depends on the library. If it uses fstream internally, this approach will work, as setlocale() will change it for everything. If the library uses Windows' file API directly I would hope that there is a wide string interface as well because otherwise you are out of luck (I guess...).

                    @Christian-Ehrlicher Using the setlocale trick also switches the output via std::cout to UTF-8. You only have to make sure that your editor saves your source code files as UTF-8. And you have to tell the Microsoft compiler that the source is UTF-8 and that it should produce UTF-8 binaries (IIRC two separate switches).

                    Little caveat: The setlocale trick does not fully solve command line arguments. Suddenly, we had problem with certain file names/paths. Here is the work around that we are using for this to truly be fully inside UTF-8:

                    #ifdef _WIN32
                    int main(int argc_discard, char** argv_discard)
                    #else
                    int main(int argc, char** argv)
                    #endif
                    {
                    #ifdef _WIN32
                        setlocale(LC_CTYPE, ".utf8");
                    
                        int argc;
                        wchar_t** wargv = CommandLineToArgvW(GetCommandLineW(), &argc);
                        char** argv;
                        argv = malloc(argc * sizeof(char*));
                        for (int i = 0; i < argc; ++i)
                        {
                            const int size = WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, NULL, 0, NULL, NULL);    // get length of string
                            argv[i] = malloc(size + 1);
                            WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, argv[i], size + 1, NULL, NULL);           // convert to UTF-8
                        }
                        LocalFree(wargv);
                    #endif
                       ...
                    }
                    
                    JonBJ Online
                    JonBJ Online
                    JonB
                    wrote on last edited by
                    #15

                    @SimonSchroeder
                    I am an old-school C purist. There is a little wrinkle with your code which is probably irrelevant/you don't care about, but for the sake of completeness you might feel like adding :)

                    Since ANSI C/C 99 it was stipulated that argv[argc] should be NULL. Code using argv could either look at argc or could iterate till argv[i] == NULL to visit all arguments. I have searched and cannot find anything which says this is no longer the case(?). So if you are passing your newly created argv replacement to other code you might verify it enters with argv[argc] == nullptr and then like to malloc((argc + 1) * sizeof(char*)) and argv[argc] = nullptr. :)

                    1 Reply Last reply
                    1
                    • R Robert Hairgrove

                      @SimonSchroeder said in QString encoding to std::string:

                      Little caveat: The setlocale trick does not fully solve command line arguments.

                      You are aware of QCoreApplication::arguments(), I hope?

                      S Offline
                      S Offline
                      SimonSchroeder
                      wrote on last edited by
                      #16

                      @Robert-Hairgrove said in QString encoding to std::string:

                      You are aware of QCoreApplication::arguments(), I hope?

                      This is something we do in our command line applications that (on purpose) do not use any Qt. (Because of static linking which makes our lives way easier with our clients.)

                      QCoreApplication::arguments() works fine ;-)

                      1 Reply Last reply
                      0

                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Users
                      • Groups
                      • Search
                      • Get Qt Extensions
                      • Unsolved