How to use "windows-1252" charset ?
-
I'm trying to read a .srt file which contains some diacritics. They are all ignored, like they don't exist.
If i try to create a QString with some of this characters, and them print them using qDebug or a QTextEdit, they are printed, along with a garbage character for each of my characters.In Java, the solution would be this one:
@File file = new File(subName);
FileInputStream fis = new FileInputStream(file);
InputStreamReader isr = new InputStreamReader(fis, "windows-1252");@Any suggestions ?
-
You may need to decode your binary data with a suitable encoder:
@
QFile f("path");
f.open(QFile::Text);
QByteArray data = f.readAll();
QTextDecoder* decoder = QTextCodec::codecForName("Windows-1252")->makeDecoder();
QString result = decoder->toUnicode(data,data.length());
// result contains standard unicode string
@ -
Nice one. Now i have the content of the file.
The problem is that i can't find the diacritics to replace them with normal characters.
I've tried searching using indexOf.I think because i can't store them properly, like i said in the first post.
How can i store this: {'þ','ã','Ã','ª','º','â','Î','î'} ? -
Hmm...
There is no problem with indexOf to find characters in the string. Also you should be able to store any unicode string in a file. if you want to store your text with Windows-1252 encoding, you may need an encoder:
@
QTextEncoder* encoder = QTextCodec::codecForName("Windows-1250")->makeEncoder();
QByteArray outputData = encoder->fromUnicode(result);
@ -
I didn't made myself clear.
After i read the entire .srt file, i want to correct it. I want to replace the diacritics ( from this range {‘þ’,‘ã’,‘Ã’,‘ª’,‘º’,‘â’,‘Î’,‘î’} ) with normal character.
In order to correct the data, i need to look for the diacritics.
To do that, i need them, stored in an array ( preferably ).Look how i did things in Java a while ago:
@
char badArray [] = {'þ','ã','Ã','ª','º','â','Î','î'};
char goodArray [] = {'t','a','A','S','s','a','I','i'};for (int i = 0; i < badArray.length ; i++) {
if (newLine.indexOf(badArray[i]) > -1) { newLine = newLine.replace(badArray[i], goodArray[i]); } }@
I can't store ( properly ) that badArray. I get some garbage along with it.
-
I'm not sure but think characters like 'Ã' are wide characters (with 2 byte codes). If correct, you will have to use wide characters (wchar_t) or QChar instead of ordinary C++ character data type.
I don't know what exactly you want to do but think you should consider about using QString and QChar instead of char array. because they are unicode-aware.