Solved How to write/read properly UTF-8 strings to/from XML files?
I am working on a multi-platform application that handles an XML file containing data created by users in different languages. My requirement is to support any kind of language/keyboard character , so I need to find the best way to write and read my XML file without losing the content, even if my application is used from different operating systems (Windows, Linux, Mac).
Is this actually possible?
I was reading some articles from here and there, but I couldn't find anything really useful:
For now, I am writing strings like this:
QString value = "Text"; root.setAttribute("variable", value);
And reading them like this:
QString variable = root.attribute("variable", "");
Obviously, it's not working as I need it. What should be the best way to do this?
You'll probably be interested in QTextCodec and the related Codec Example.
Hope it helps
Thanks! That's exactly what I was looking for.
Now, I have another question: can I use a specific codec to support several languages? is there a "universal" codec or something like that? (i.e. UTF-8)
What are the best practices in that sense?
P.S.: I need that the encoding action is transparent for the users.
IIRC there's no absolute universal codec that covers everything but if your application should only read and write from files it has generated then you can choose the codec you will use. AFAIK, UTF-8 should have you covered for most cases.
Just the last question:
Now I know how to handle the locale encoding/decoding part but my XML has this structure:
<document> <user name="Ali Baba" age="38"/> <profession title="carpenter" experience="3"/> ... ... </document>
The attributes "name" and "title" are part of the variables that I need to encode/decode (I don't need to encode the whole XML file), so I was looking for examples about how to store QByteArray variables as XML attributes with no luck.
Is it possible to save/load QByteArray values as parameters of an XML file?
Don't try that, encode the complete file. That's really a bad idea to put text in several different encodings in a file. How would you know when and what to decode ?
After trying several approaches, finally I found a very simple way to solve my problem. All I had to do was to add this line at the beginning of my main.cpp file:
After that change, all the strings used in my project were stored using the codec "UTF-8". I created several files and I could edit them from Windows, Linux and Mac without losing special characters. Problem solved.
Thank you so much for your advice!