Pain and insanity: Unicode filenames and Qt
I'm going bonkers here.
I am using PySide (tried PyQt also) but I can't possibly get QFile to open a file when the filename contains unicode characters (e.g. "/home/user/þæö.txt" ).
QFile.open always returns False on these kind of files, but works fine when the name only contains ASCII characters.
Python's open() function also works fine for the unicode filenames.
I've tried mangling the filename in every way I know, decoding it to utf8, encoding it into a bytestring and whatever. But no avail.
Havin run this with strace I realized that what Qt is doing is that it simply strips all the unicode characters from the filename before calling the system's open() function, so it's really just trying to open a non-existing file.
I made this script to demonstrate the problem http://pastebin.com/rKCkeJNQ
If you guys don't mind, I would appreciate if you could run it and see if you get the same result, to see if this is some problem with my system.
All it does is create a couple of empty files in your system's temp directory and then try to open them with Qt, printing the result to console.
Thanks for reading.
I was crawling through the Qt source code and it looks the filename is always passed through this function in corelib/io/qfile.cpp on unix
static QString locale_decode(const QByteArray &f)
// Mac always gives us UTF-8 and decomposed, we want that composed...
SO it's always giving me the "fromLocal8Bit" value of the string.
Which results in unicode characters being stripped from the filename.
What to do with this information, I'm not sure yet.
FromLocal8Bit is what you usually want: It should transcode your local encoding to unicode. This requires your system to be set up correctly though: If your local encoding is set incorrectly the transformation will indeed fail.
Lots of fixes in that area have gone into Qt4.8, are you using that (or newer) ?
It might also be good to check what the encoding is of the file on your filesystem. If you use Windows, it likely is not utf8. If you use Linux, its a 50% chance being utf8 or some other encoding.