Important: Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

Encoding and locale problem



  • Hello everyone.

    I have some situation with reading .docx and .txt files. Sounds foolish? I think so too, but it's a real problem in my case, and i have not found nothing specific in google.

    The main difficulty lies in the fact, what i can't read .docx files at all. Is there some solution?

    Secondary problem is in encoding. When i read .txt files on english - all just perfect. But text on russian shows in wrong encoding, even then i set default encoding CP-1251. Any ideas?

    Many thanks in advance.

    Here is a code:
    @compare::compare()
    {
    exampleOldVersion = new QDir("C:/Users/House15/Desktop/exampleOldVersion");
    exampleNewVersions = new QDir("C:/Users/House15/Desktop/exampleNewVersions");
    exampleOfResults = new QDir("C:/Users/House15/Desktop/exampleOfResult");

    ListOfOldFiles = exampleOldVersion->entryList(QDir::Files,QDir::NoSort);
    ListOfNewFiles=exampleNewVersions->entryList(QDir::Files,QDir::NoSort);
    
    
    stringInOldFile = new char[80];
    stringInNewFile= new char[80];
    
    oldVersion=new QFile(exampleOldVersion->absolutePath()+"/"+ListOfOldFiles.first());
    
    QFile *newVersionOfOldFile;
    for(int i=0; i<ListOfNewFiles.count();i++)
    {
        newVersionOfOldFile = new QFile&#40;exampleNewVersions->absolutePath(&#41;+"/"+ListOfNewFiles.at(i&#41;);
    
        if(newVersionOfOldFile)
            newVersions.append(newVersionOfOldFile);
        else
        {
            ListOfNewFiles.removeAt(i);
            i--;
        }
        newVersionOfOldFile=NULL;
    }
    
    
    oldVersion->open(QIODevice::ReadWrite);
    
    
    QTextStream in(&(*oldVersion));
    QString testString=in.device()->readAll();
    in.device()->seek(0);
    in>>stringInOldFile;
        qDebug()<<testString+" \n\n "+stringInOldFile;
    oldVersion->close();
    

    }@



  • docx is a compressed file of several XML files. You may want to extract it first and then read file by file.

    qDebug() supports only ANSI characters as I experienced. Try printing text in text edit widget (QTextEdit class).

    You can also add
    @CODECFORSRC = UTF-8@
    in your project file.

    Note: char also only supports ANSI characters. Consider replacing them with QString.



  • [quote author="Jake007" date="1330792854"]docx is a compressed file of several XML files. You may want to extract it first and then read file by file.

    qDebug() supports only ANSI characters as I experienced. Try printing text in text edit widget (QTextEdit class).

    You can also add
    @CODECFORSRC = UTF-8@
    in your project file.

    Note: char also only supports ANSI characters. Consider replacing them with QString.[/quote]

    You meen for docx documents need QXml classes? Extracting text from xml is standart procedure, or i need write parser for this purpose?



  • Docx is a compressed file ( like rar, zip, tar etc.). It's probably zip, but you'll have to google a little.
    When you'll extract it, you'll get a bunch of xml files. Then you can parse extracted xml files as usual.



  • Is there are some class or libs for zip/unzip files? In my eyes came across QuaZIP, but i not sure what it's right solution for static compiling project. Can you suggest anything?



  • Another problem. I've notice strange behavior of Ascii code. Then i use russian text, Ascii is always 63 (only '/r' displays like 13) on each character. I use QTextCodec("CP1251") - is it possible cause of this problem? Then i use english text - all displays normal.



  • I never tried to decompress or compress anything using Qt, so unfortunately I can not.
    When you try with English text, you try using the same codec?

    And, are Russian characters even in ascii table?


Log in to reply