QString encoding to std::string
-
wrote on 1 Mar 2023, 10:40 last edited by Markus1990 3 Jan 2023, 10:49
I read a lot about encoding the last days, but nothing solved my problem so far.
In Qt you can ask the user for a filename/path like:
QString q_filename=QFileDialog::getOpenFileName(...);
If I use now QMethods to open the file with q_filename, everything will be fine. As soon as I like to open the file with CppMethods (due to some external library), I have to transform the filename from QString to std::string:
std::string cpp_filename=q_filename.toStdString();
But now some files won't open, due to special characters/encoding.
I can change the encoding for example by using:
std::string cpp_filename=q_filename.toLatin1().toStdString();
I tested several encodings (like toUtf8, toLocal8Bit etc.) and some of them work with my special characters some didn't work.
My idea was to simply test it and take a solution that works. However, as soon as I start using std::filesystem::path to perform some operation on the cpp_filename everything starts falling apart and very strange characters appear.
My question is now:
How do I know, how I have to transform the encoding. I read a lot about encoding, but everybody assumes, that I know where I come from and where to go. But how do I know this?It seems strange to me, that in 2023 using Windows 10, MinGW and Qt 6.3 this is still problem. I always thought everyone is enjoying utf8, today.
-
I read a lot about encoding the last days, but nothing solved my problem so far.
In Qt you can ask the user for a filename/path like:
QString q_filename=QFileDialog::getOpenFileName(...);
If I use now QMethods to open the file with q_filename, everything will be fine. As soon as I like to open the file with CppMethods (due to some external library), I have to transform the filename from QString to std::string:
std::string cpp_filename=q_filename.toStdString();
But now some files won't open, due to special characters/encoding.
I can change the encoding for example by using:
std::string cpp_filename=q_filename.toLatin1().toStdString();
I tested several encodings (like toUtf8, toLocal8Bit etc.) and some of them work with my special characters some didn't work.
My idea was to simply test it and take a solution that works. However, as soon as I start using std::filesystem::path to perform some operation on the cpp_filename everything starts falling apart and very strange characters appear.
My question is now:
How do I know, how I have to transform the encoding. I read a lot about encoding, but everybody assumes, that I know where I come from and where to go. But how do I know this?It seems strange to me, that in 2023 using Windows 10, MinGW and Qt 6.3 this is still problem. I always thought everyone is enjoying utf8, today.
wrote on 1 Mar 2023, 11:40 last edited by JonB 3 Jan 2023, 11:47@Markus1990
Hello and welcome.My (admittedly limited) understanding is that Windows does not use UTF8, at least for file paths. Are you saying that with
toLatin1()
does work while others liketoUtf8()
do not?I find comments like https://www.reddit.com/r/cpp_questions/comments/ov1xqw/paths_with_nonascii_chars_on_windows/
std::string is an ANSI string. If you need to support non-ANSI characters, you need to use wstring or u8string (utf-8).
There is also a discussion there about
On newer versions of Windows 10 (v1903 and newer) it is possible to define a new setting called activeCodePage to UTF-8 in the manifest file of your application:
https://docs.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activecodepage
If you are using mingw instead of MSVC, use GCC from ucrt repository. Call setlocale(LC_ALL, ".UTF8") in main() before anything else. After that most C library functions should assume that strings passed as char* are UTF-8. Use std::filesystem::path::u8string() to get UTF-8 encoded file name.
However take this with a pinch of salt, someone else may know better than I what your issue is/what to do about it....
-
@Markus1990
Hello and welcome.My (admittedly limited) understanding is that Windows does not use UTF8, at least for file paths. Are you saying that with
toLatin1()
does work while others liketoUtf8()
do not?I find comments like https://www.reddit.com/r/cpp_questions/comments/ov1xqw/paths_with_nonascii_chars_on_windows/
std::string is an ANSI string. If you need to support non-ANSI characters, you need to use wstring or u8string (utf-8).
There is also a discussion there about
On newer versions of Windows 10 (v1903 and newer) it is possible to define a new setting called activeCodePage to UTF-8 in the manifest file of your application:
https://docs.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activecodepage
If you are using mingw instead of MSVC, use GCC from ucrt repository. Call setlocale(LC_ALL, ".UTF8") in main() before anything else. After that most C library functions should assume that strings passed as char* are UTF-8. Use std::filesystem::path::u8string() to get UTF-8 encoded file name.
However take this with a pinch of salt, someone else may know better than I what your issue is/what to do about it....
wrote on 1 Mar 2023, 13:43 last edited by@JonB Thanks, but unfortunately the suggestion from reddit won't be very useful because the system also has to run under Linux. At the moment I just try to avoid the problem. Since a lot of the naming the user won't see, I can use "safe" filenames.
-
@JonB Thanks, but unfortunately the suggestion from reddit won't be very useful because the system also has to run under Linux. At the moment I just try to avoid the problem. Since a lot of the naming the user won't see, I can use "safe" filenames.
wrote on 1 Mar 2023, 13:47 last edited by@Markus1990 said in QString encoding to std::string:
the suggestion from reddit won't be very useful because the system also has to run under Linux
I thought that Linux does use UTF-8 so there won't be the issue there, you are just looking for a solution for Windows. Anyway I don't know so over to you.
-
@Markus1990 said in QString encoding to std::string:
the suggestion from reddit won't be very useful because the system also has to run under Linux
I thought that Linux does use UTF-8 so there won't be the issue there, you are just looking for a solution for Windows. Anyway I don't know so over to you.
Lifetime Qt Championwrote on 1 Mar 2023, 16:29 last edited by Christian Ehrlicher 3 Jan 2023, 16:30If you want to use a non-ascii filename under windows with a non Qt api you have to use the encoding currently set for your account. For this you can use QString::toLocal8Bit(). But then you won't be able to open a file with characters non representable in your local encoding. Therefore convert QString to a std::wstring and use the appropriate system calls which take a std::wstring instead.
Or even better - stay within Qt. -
I read a lot about encoding the last days, but nothing solved my problem so far.
In Qt you can ask the user for a filename/path like:
QString q_filename=QFileDialog::getOpenFileName(...);
If I use now QMethods to open the file with q_filename, everything will be fine. As soon as I like to open the file with CppMethods (due to some external library), I have to transform the filename from QString to std::string:
std::string cpp_filename=q_filename.toStdString();
But now some files won't open, due to special characters/encoding.
I can change the encoding for example by using:
std::string cpp_filename=q_filename.toLatin1().toStdString();
I tested several encodings (like toUtf8, toLocal8Bit etc.) and some of them work with my special characters some didn't work.
My idea was to simply test it and take a solution that works. However, as soon as I start using std::filesystem::path to perform some operation on the cpp_filename everything starts falling apart and very strange characters appear.
My question is now:
How do I know, how I have to transform the encoding. I read a lot about encoding, but everybody assumes, that I know where I come from and where to go. But how do I know this?It seems strange to me, that in 2023 using Windows 10, MinGW and Qt 6.3 this is still problem. I always thought everyone is enjoying utf8, today.
wrote on 1 Mar 2023, 21:26 last edited by@Markus1990 I have had the same problem before. On Linux, the
utf-8
encoding should work OK. But on Windows,utf-16
is the native encoding.This page might help:
https://newbedev.com/how-to-open-an-std-fstream-ofstream-or-ifstream-with-a-unicode-filenameAlso, there is
QFile::encodeName()
andQFile::decodeName()
which might help. -
wrote on 2 Mar 2023, 08:46 last edited by
Here is what we do and it has been working without any problems over a year now:
We use the approach with
setlocale(LC_ALL, ".UTF8")
at the beginning of main(). You need to wrap it inside an #ifdef for Windows because it can crash on other systems (I think it was on macOS, but not on Linux). Now, you are in a world where on all systems you can open files using UTF-8 filenames.For a while we had some strange problems with converting QString to std::string. Here is the way we do it now:
QString qstr("...something..."); std::string str = qstr.toUtf8().data(); // and the other way around qstr = QString::fromUtf8(str.c_str());
This has worked reliably for us.
Though, there is technically a slight caveat concerning temporaries. To be on the safe side, you should save the result from toUtf8() to a temporary QByteArray and use data() on this variable. In very few cases we got our app to crash because of this. I think the C++ standard states something like this about temporaries.
-
Here is what we do and it has been working without any problems over a year now:
We use the approach with
setlocale(LC_ALL, ".UTF8")
at the beginning of main(). You need to wrap it inside an #ifdef for Windows because it can crash on other systems (I think it was on macOS, but not on Linux). Now, you are in a world where on all systems you can open files using UTF-8 filenames.For a while we had some strange problems with converting QString to std::string. Here is the way we do it now:
QString qstr("...something..."); std::string str = qstr.toUtf8().data(); // and the other way around qstr = QString::fromUtf8(str.c_str());
This has worked reliably for us.
Though, there is technically a slight caveat concerning temporaries. To be on the safe side, you should save the result from toUtf8() to a temporary QByteArray and use data() on this variable. In very few cases we got our app to crash because of this. I think the C++ standard states something like this about temporaries.
wrote on 2 Mar 2023, 09:11 last edited by@SimonSchroeder said in QString encoding to std::string:
We use the approach with
setlocale(LC_ALL, ".UTF8")
at the beginning of main(). You need to wrap it inside an #ifdef for WindowsI suggested this above to the OP, but they rejected it on the basis of
but unfortunately the suggestion from reddit won't be very useful because the system also has to run under Linux
! :)
-
Here is what we do and it has been working without any problems over a year now:
We use the approach with
setlocale(LC_ALL, ".UTF8")
at the beginning of main(). You need to wrap it inside an #ifdef for Windows because it can crash on other systems (I think it was on macOS, but not on Linux). Now, you are in a world where on all systems you can open files using UTF-8 filenames.For a while we had some strange problems with converting QString to std::string. Here is the way we do it now:
QString qstr("...something..."); std::string str = qstr.toUtf8().data(); // and the other way around qstr = QString::fromUtf8(str.c_str());
This has worked reliably for us.
Though, there is technically a slight caveat concerning temporaries. To be on the safe side, you should save the result from toUtf8() to a temporary QByteArray and use data() on this variable. In very few cases we got our app to crash because of this. I think the C++ standard states something like this about temporaries.
wrote on 2 Mar 2023, 11:10 last edited by@SimonSchroeder The OP needs a
std::string
for a 3rd-party library. If the file access is done by the library, and not in his or her own code, it will probably not work on Windows if there are Unicode characters in the file name.The
#ifdef
switching should be enough to ensure that it will continue to work normally under Linux, if it was working before:#ifdef QT_OS_WINDOWS setLocale(LC_ALL, ".UTF8"); #endif
-
@Markus1990 I have had the same problem before. On Linux, the
utf-8
encoding should work OK. But on Windows,utf-16
is the native encoding.This page might help:
https://newbedev.com/how-to-open-an-std-fstream-ofstream-or-ifstream-with-a-unicode-filenameAlso, there is
QFile::encodeName()
andQFile::decodeName()
which might help.@Robert-Hairgrove said in QString encoding to std::string:
But on Windows, utf-16 is the native encoding.
This is wrong.
-
@Robert-Hairgrove said in QString encoding to std::string:
But on Windows, utf-16 is the native encoding.
This is wrong.
wrote on 2 Mar 2023, 19:42 last edited by@Christian-Ehrlicher said in QString encoding to std::string:
@Robert-Hairgrove said in QString encoding to std::string:
But on Windows, utf-16 is the native encoding.
This is wrong.
Well, not completely wrong, anyway. This is a quote directly from the "horse's mouth", as it were:
"UTF-16 is basically the de facto standard encoding used by Windows Unicode-enabled APIs. UTF-16 is the “native” Unicode encoding in many other software systems, as well. For example, Qt, Java and the International Components for Unicode (ICU) library, just to name a few, use UTF-16 encoding to store Unicode strings."
By the way, thanks for your extremely helpful reply!
-
@Christian-Ehrlicher said in QString encoding to std::string:
@Robert-Hairgrove said in QString encoding to std::string:
But on Windows, utf-16 is the native encoding.
This is wrong.
Well, not completely wrong, anyway. This is a quote directly from the "horse's mouth", as it were:
"UTF-16 is basically the de facto standard encoding used by Windows Unicode-enabled APIs. UTF-16 is the “native” Unicode encoding in many other software systems, as well. For example, Qt, Java and the International Components for Unicode (ICU) library, just to name a few, use UTF-16 encoding to store Unicode strings."
By the way, thanks for your extremely helpful reply!
@Robert-Hairgrove It just tells you that all WinAPI calls ending with 'W' are taking an utf-16 string but that was the case already 20 years ago. All non-native stuff which most c(++) programmers use take a std::string or char* and this is encoded in the current locale. Even the console output is printed in ythe current locale instead utf-8 (which is the default since maybe one year, don't even know if it is the default when you just do an upgrade) and therefore all the questions why qDebug() and std::cout don't print the correct characters on a windows console.
-
@SimonSchroeder The OP needs a
std::string
for a 3rd-party library. If the file access is done by the library, and not in his or her own code, it will probably not work on Windows if there are Unicode characters in the file name.The
#ifdef
switching should be enough to ensure that it will continue to work normally under Linux, if it was working before:#ifdef QT_OS_WINDOWS setLocale(LC_ALL, ".UTF8"); #endif
wrote on 3 Mar 2023, 08:16 last edited by@Robert-Hairgrove said in QString encoding to std::string:
The OP needs a std::string for a 3rd-party library. If the file access is done by the library, and not in his or her own code, it will probably not work on Windows if there are Unicode characters in the file name.
So, then it depends on the library. If it uses fstream internally, this approach will work, as setlocale() will change it for everything. If the library uses Windows' file API directly I would hope that there is a wide string interface as well because otherwise you are out of luck (I guess...).
@Christian-Ehrlicher Using the setlocale trick also switches the output via std::cout to UTF-8. You only have to make sure that your editor saves your source code files as UTF-8. And you have to tell the Microsoft compiler that the source is UTF-8 and that it should produce UTF-8 binaries (IIRC two separate switches).
Little caveat: The setlocale trick does not fully solve command line arguments. Suddenly, we had problem with certain file names/paths. Here is the work around that we are using for this to truly be fully inside UTF-8:
#ifdef _WIN32 int main(int argc_discard, char** argv_discard) #else int main(int argc, char** argv) #endif { #ifdef _WIN32 setlocale(LC_CTYPE, ".utf8"); int argc; wchar_t** wargv = CommandLineToArgvW(GetCommandLineW(), &argc); char** argv; argv = malloc(argc * sizeof(char*)); for (int i = 0; i < argc; ++i) { const int size = WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, NULL, 0, NULL, NULL); // get length of string argv[i] = malloc(size + 1); WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, argv[i], size + 1, NULL, NULL); // convert to UTF-8 } LocalFree(wargv); #endif ... }
-
@Robert-Hairgrove said in QString encoding to std::string:
The OP needs a std::string for a 3rd-party library. If the file access is done by the library, and not in his or her own code, it will probably not work on Windows if there are Unicode characters in the file name.
So, then it depends on the library. If it uses fstream internally, this approach will work, as setlocale() will change it for everything. If the library uses Windows' file API directly I would hope that there is a wide string interface as well because otherwise you are out of luck (I guess...).
@Christian-Ehrlicher Using the setlocale trick also switches the output via std::cout to UTF-8. You only have to make sure that your editor saves your source code files as UTF-8. And you have to tell the Microsoft compiler that the source is UTF-8 and that it should produce UTF-8 binaries (IIRC two separate switches).
Little caveat: The setlocale trick does not fully solve command line arguments. Suddenly, we had problem with certain file names/paths. Here is the work around that we are using for this to truly be fully inside UTF-8:
#ifdef _WIN32 int main(int argc_discard, char** argv_discard) #else int main(int argc, char** argv) #endif { #ifdef _WIN32 setlocale(LC_CTYPE, ".utf8"); int argc; wchar_t** wargv = CommandLineToArgvW(GetCommandLineW(), &argc); char** argv; argv = malloc(argc * sizeof(char*)); for (int i = 0; i < argc; ++i) { const int size = WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, NULL, 0, NULL, NULL); // get length of string argv[i] = malloc(size + 1); WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, argv[i], size + 1, NULL, NULL); // convert to UTF-8 } LocalFree(wargv); #endif ... }
wrote on 3 Mar 2023, 08:27 last edited by@SimonSchroeder said in QString encoding to std::string:
Little caveat: The setlocale trick does not fully solve command line arguments.
You are aware of
QCoreApplication::arguments()
, I hope? -
@Robert-Hairgrove said in QString encoding to std::string:
The OP needs a std::string for a 3rd-party library. If the file access is done by the library, and not in his or her own code, it will probably not work on Windows if there are Unicode characters in the file name.
So, then it depends on the library. If it uses fstream internally, this approach will work, as setlocale() will change it for everything. If the library uses Windows' file API directly I would hope that there is a wide string interface as well because otherwise you are out of luck (I guess...).
@Christian-Ehrlicher Using the setlocale trick also switches the output via std::cout to UTF-8. You only have to make sure that your editor saves your source code files as UTF-8. And you have to tell the Microsoft compiler that the source is UTF-8 and that it should produce UTF-8 binaries (IIRC two separate switches).
Little caveat: The setlocale trick does not fully solve command line arguments. Suddenly, we had problem with certain file names/paths. Here is the work around that we are using for this to truly be fully inside UTF-8:
#ifdef _WIN32 int main(int argc_discard, char** argv_discard) #else int main(int argc, char** argv) #endif { #ifdef _WIN32 setlocale(LC_CTYPE, ".utf8"); int argc; wchar_t** wargv = CommandLineToArgvW(GetCommandLineW(), &argc); char** argv; argv = malloc(argc * sizeof(char*)); for (int i = 0; i < argc; ++i) { const int size = WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, NULL, 0, NULL, NULL); // get length of string argv[i] = malloc(size + 1); WideCharToMultiByte(CP_UTF8, 0, wargv[i], -1, argv[i], size + 1, NULL, NULL); // convert to UTF-8 } LocalFree(wargv); #endif ... }
wrote on 3 Mar 2023, 08:32 last edited by@SimonSchroeder
I am an old-school C purist. There is a little wrinkle with your code which is probably irrelevant/you don't care about, but for the sake of completeness you might feel like adding :)Since ANSI C/C 99 it was stipulated that
argv[argc]
should beNULL
. Code usingargv
could either look atargc
or could iterate tillargv[i] == NULL
to visit all arguments. I have searched and cannot find anything which says this is no longer the case(?). So if you are passing your newly createdargv
replacement to other code you might verify it enters withargv[argc] == nullptr
and then like tomalloc((argc + 1) * sizeof(char*))
andargv[argc] = nullptr
. :) -
@SimonSchroeder said in QString encoding to std::string:
Little caveat: The setlocale trick does not fully solve command line arguments.
You are aware of
QCoreApplication::arguments()
, I hope?wrote on 3 Mar 2023, 08:33 last edited by@Robert-Hairgrove said in QString encoding to std::string:
You are aware of QCoreApplication::arguments(), I hope?
This is something we do in our command line applications that (on purpose) do not use any Qt. (Because of static linking which makes our lives way easier with our clients.)
QCoreApplication::arguments() works fine ;-)
1/16