Encoding and locale problem



  • Hello everyone.

    I have some situation with reading .docx and .txt files. Sounds foolish? I think so too, but it's a real problem in my case, and i have not found nothing specific in google.

    The main difficulty lies in the fact, what i can't read .docx files at all. Is there some solution?

    Secondary problem is in encoding. When i read .txt files on english - all just perfect. But text on russian shows in wrong encoding, even then i set default encoding CP-1251. Any ideas?

    Many thanks in advance.

    Here is a code:
    @compare::compare()
    {
    exampleOldVersion = new QDir("C:/Users/House15/Desktop/exampleOldVersion");
    exampleNewVersions = new QDir("C:/Users/House15/Desktop/exampleNewVersions");
    exampleOfResults = new QDir("C:/Users/House15/Desktop/exampleOfResult");

    ListOfOldFiles = exampleOldVersion->entryList(QDir::Files,QDir::NoSort);
    ListOfNewFiles=exampleNewVersions->entryList(QDir::Files,QDir::NoSort);
    
    
    stringInOldFile = new char[80];
    stringInNewFile= new char[80];
    
    oldVersion=new QFile(exampleOldVersion->absolutePath()+"/"+ListOfOldFiles.first());
    
    QFile *newVersionOfOldFile;
    for(int i=0; i<ListOfNewFiles.count();i++)
    {
        newVersionOfOldFile = new QFile&#40;exampleNewVersions->absolutePath(&#41;+"/"+ListOfNewFiles.at(i&#41;);
    
        if(newVersionOfOldFile)
            newVersions.append(newVersionOfOldFile);
        else
        {
            ListOfNewFiles.removeAt(i);
            i--;
        }
        newVersionOfOldFile=NULL;
    }
    
    
    oldVersion->open(QIODevice::ReadWrite);
    
    
    QTextStream in(&(*oldVersion));
    QString testString=in.device()->readAll();
    in.device()->seek(0);
    in>>stringInOldFile;
        qDebug()<<testString+" \n\n "+stringInOldFile;
    oldVersion->close();
    

    }@



  • docx is a compressed file of several XML files. You may want to extract it first and then read file by file.

    qDebug() supports only ANSI characters as I experienced. Try printing text in text edit widget (QTextEdit class).

    You can also add
    @CODECFORSRC = UTF-8@
    in your project file.

    Note: char also only supports ANSI characters. Consider replacing them with QString.



  • [quote author="Jake007" date="1330792854"]docx is a compressed file of several XML files. You may want to extract it first and then read file by file.

    qDebug() supports only ANSI characters as I experienced. Try printing text in text edit widget (QTextEdit class).

    You can also add
    @CODECFORSRC = UTF-8@
    in your project file.

    Note: char also only supports ANSI characters. Consider replacing them with QString.[/quote]

    You meen for docx documents need QXml classes? Extracting text from xml is standart procedure, or i need write parser for this purpose?



  • Docx is a compressed file ( like rar, zip, tar etc.). It's probably zip, but you'll have to google a little.
    When you'll extract it, you'll get a bunch of xml files. Then you can parse extracted xml files as usual.



  • Is there are some class or libs for zip/unzip files? In my eyes came across QuaZIP, but i not sure what it's right solution for static compiling project. Can you suggest anything?



  • Another problem. I've notice strange behavior of Ascii code. Then i use russian text, Ascii is always 63 (only '/r' displays like 13) on each character. I use QTextCodec("CP1251") - is it possible cause of this problem? Then i use english text - all displays normal.



  • I never tried to decompress or compress anything using Qt, so unfortunately I can not.
    When you try with English text, you try using the same codec?

    And, are Russian characters even in ascii table?


Log in to reply
 

Looks like your connection to Qt Forum was lost, please wait while we try to reconnect.