How to use "windows-1252" charset ?



  • I'm trying to read a .srt file which contains some diacritics. They are all ignored, like they don't exist.
    If i try to create a QString with some of this characters, and them print them using qDebug or a QTextEdit, they are printed, along with a garbage character for each of my characters.

    In Java, the solution would be this one:
    @File file = new File(subName);
    FileInputStream fis = new FileInputStream(file);
    InputStreamReader isr = new InputStreamReader(fis, "windows-1252");@

    Any suggestions ?



  • You may need to decode your binary data with a suitable encoder:
    @
    QFile f("path");
    f.open(QFile::Text);
    QByteArray data = f.readAll();
    QTextDecoder* decoder = QTextCodec::codecForName("Windows-1252")->makeDecoder();
    QString result = decoder->toUnicode(data,data.length());
    // result contains standard unicode string
    @



  • Nice one. Now i have the content of the file.

    The problem is that i can't find the diacritics to replace them with normal characters.
    I've tried searching using indexOf.

    I think because i can't store them properly, like i said in the first post.
    How can i store this: {'þ','ã','Ã','ª','º','â','Î','î'} ?



  • Hmm...

    There is no problem with indexOf to find characters in the string. Also you should be able to store any unicode string in a file. if you want to store your text with Windows-1252 encoding, you may need an encoder:
    @
    QTextEncoder* encoder = QTextCodec::codecForName("Windows-1250")->makeEncoder();
    QByteArray outputData = encoder->fromUnicode(result);
    @



  • I didn't made myself clear.

    After i read the entire .srt file, i want to correct it. I want to replace the diacritics ( from this range {‘þ’,‘ã’,‘Ã’,‘ª’,‘º’,‘â’,‘Î’,‘î’} ) with normal character.

    In order to correct the data, i need to look for the diacritics.
    To do that, i need them, stored in an array ( preferably ).

    Look how i did things in Java a while ago:

    @
    char badArray [] = {'þ','ã','Ã','ª','º','â','Î','î'};
    char goodArray [] = {'t','a','A','S','s','a','I','i'};

    for (int i = 0; i < badArray.length ; i++) {

                                    if (newLine.indexOf(badArray[i]) > -1) {
                                     
                                     newLine = newLine.replace(badArray[i], goodArray[i]);
                                    
                                    }       
     }@
    

    I can't store ( properly ) that badArray. I get some garbage along with it.



  • I'm not sure but think characters like 'Ã' are wide characters (with 2 byte codes). If correct, you will have to use wide characters (wchar_t) or QChar instead of ordinary C++ character data type.
    I don't know what exactly you want to do but think you should consider about using QString and QChar instead of char array. because they are unicode-aware.



  • Yes, i've tried to store them in an QString or QChar, but they aren't stored properly.
    Just try to store and print one of them, and see the garbage.



  • How did you store them in QChar array? I think you forgot to use wide characters. did you get a wide char constant warning?
    Your code should look like this:
    @
    QChar badArray [] = {L'þ',L'ã',L'Ã',L'ª',L'º',L'â',L'Î',L'î'};
    @



  • I didn't even know that i must use wide chars.

    Thanks you, for all your help :)


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.