How to use "windows-1252" charset ?
-
wrote on 3 Jul 2011, 12:42 last edited by
I'm trying to read a .srt file which contains some diacritics. They are all ignored, like they don't exist.
If i try to create a QString with some of this characters, and them print them using qDebug or a QTextEdit, they are printed, along with a garbage character for each of my characters.In Java, the solution would be this one:
@File file = new File(subName);
FileInputStream fis = new FileInputStream(file);
InputStreamReader isr = new InputStreamReader(fis, "windows-1252");@Any suggestions ?
-
wrote on 3 Jul 2011, 12:55 last edited by
You may need to decode your binary data with a suitable encoder:
@
QFile f("path");
f.open(QFile::Text);
QByteArray data = f.readAll();
QTextDecoder* decoder = QTextCodec::codecForName("Windows-1252")->makeDecoder();
QString result = decoder->toUnicode(data,data.length());
// result contains standard unicode string
@ -
wrote on 3 Jul 2011, 14:21 last edited by
Nice one. Now i have the content of the file.
The problem is that i can't find the diacritics to replace them with normal characters.
I've tried searching using indexOf.I think because i can't store them properly, like i said in the first post.
How can i store this: {'þ','ã','Ã','ª','º','â','Î','î'} ? -
wrote on 3 Jul 2011, 14:33 last edited by
Hmm...
There is no problem with indexOf to find characters in the string. Also you should be able to store any unicode string in a file. if you want to store your text with Windows-1252 encoding, you may need an encoder:
@
QTextEncoder* encoder = QTextCodec::codecForName("Windows-1250")->makeEncoder();
QByteArray outputData = encoder->fromUnicode(result);
@ -
wrote on 3 Jul 2011, 14:45 last edited by
I didn't made myself clear.
After i read the entire .srt file, i want to correct it. I want to replace the diacritics ( from this range {‘þ’,‘ã’,‘Ã’,‘ª’,‘º’,‘â’,‘Î’,‘î’} ) with normal character.
In order to correct the data, i need to look for the diacritics.
To do that, i need them, stored in an array ( preferably ).Look how i did things in Java a while ago:
@
char badArray [] = {'þ','ã','Ã','ª','º','â','Î','î'};
char goodArray [] = {'t','a','A','S','s','a','I','i'};for (int i = 0; i < badArray.length ; i++) {
if (newLine.indexOf(badArray[i]) > -1) { newLine = newLine.replace(badArray[i], goodArray[i]); } }@
I can't store ( properly ) that badArray. I get some garbage along with it.
-
wrote on 3 Jul 2011, 17:51 last edited by
I'm not sure but think characters like 'Ã' are wide characters (with 2 byte codes). If correct, you will have to use wide characters (wchar_t) or QChar instead of ordinary C++ character data type.
I don't know what exactly you want to do but think you should consider about using QString and QChar instead of char array. because they are unicode-aware. -
wrote on 3 Jul 2011, 18:07 last edited by
Yes, i've tried to store them in an QString or QChar, but they aren't stored properly.
Just try to store and print one of them, and see the garbage. -
wrote on 3 Jul 2011, 18:25 last edited by
How did you store them in QChar array? I think you forgot to use wide characters. did you get a wide char constant warning?
Your code should look like this:
@
QChar badArray [] = {L'þ',L'ã',L'Ã',L'ª',L'º',L'â',L'Î',L'î'};
@ -
wrote on 3 Jul 2011, 19:40 last edited by
I didn't even know that i must use wide chars.
Thanks you, for all your help :)
1/9