Read doc and docx files
-
In this code txt file is normaly read, but docx return "PK\u0003\u0004\u0014" , then text in file "Test Text for docx "
#include "mainwindow.h" #include <QApplication> #include <QFile> //для работы с файлами #include <QtConcurrent/QtConcurrent> //for qDebug int main(int argc, char *argv[]) { QApplication a(argc, argv); MainWindow w; QFile TrainingFile("C:\\ForResume\\2.docx"); //QFile TrainingFile("C:\\ForResume\\1.txt"); QString Text; if (TrainingFile.open(QIODevice::ReadOnly)) { Text = TrainingFile.readAll(); } qDebug()<<TrainingFile; qDebug()<<Text; w.show(); return a.exec(); }
-
In this code txt file is normaly read, but docx return "PK\u0003\u0004\u0014" , then text in file "Test Text for docx "
#include "mainwindow.h" #include <QApplication> #include <QFile> //для работы с файлами #include <QtConcurrent/QtConcurrent> //for qDebug int main(int argc, char *argv[]) { QApplication a(argc, argv); MainWindow w; QFile TrainingFile("C:\\ForResume\\2.docx"); //QFile TrainingFile("C:\\ForResume\\1.txt"); QString Text; if (TrainingFile.open(QIODevice::ReadOnly)) { Text = TrainingFile.readAll(); } qDebug()<<TrainingFile; qDebug()<<Text; w.show(); return a.exec(); }
@Mikeeeeee said in Read doc and docx fils:
In this code txt file is normaly read, but docx return "PK\u0003\u0004\u0014"
PK is the signature of a zip file.
Seems pretty normal since docx files are zip files ;) -
It's not that trivial.
The documentation that specifies the MS Excel formats is ISO/IEC 29500-# (# being 1, 2, 3 and 4) which is ~7000 pages of standard.
A library that handles that format is a monumental project.Your best bet is touse
QAxObject
to interact with MS word and use its engine to read/write the file -
Summoning the help of the expert @hskoglund
-
@Mikeeeeee said in Read doc and docx fils:
@VRonin said in Read doc and docx fils:
QAxObject
How can I use QAxObject?
Did you start by looking at post https://forum.qt.io/topic/74254/how-to-read-and-write-docx-files-in-qt/26 in @jsulm's link?
-
@Mikeeeeee
It's an example in response to yourHow can I use QAxObject?
illustrating how you can use that to talk to VBA in Word. Which is what you asked for. It does not show how to retrieve the document contents. That is for you to look up, to discover whatever the necessary VBA is to do that. I am not going to do that for you. I cannot say whether someone else is prepared to do the work and write the code for you.
-
M ore information here
This code worksQAxObject wordApplication("Word.Application"); QAxObject *documents = wordApplication.querySubObject("Documents"); QAxObject *document = documents->querySubObject("Open(const QString&, bool)", "C:\\ForResume\\2.docx", true); QAxObject *words = document->querySubObject("Words"); QString textResult; int countWord = words->dynamicCall("Count()").toInt(); for (int a = 1; a <= countWord; a++){ textResult.append(words->querySubObject("Item(int)", a)->dynamicCall("Text()").toString()); } qDebug()<<textResult;
-
M ore information here
This code worksQAxObject wordApplication("Word.Application"); QAxObject *documents = wordApplication.querySubObject("Documents"); QAxObject *document = documents->querySubObject("Open(const QString&, bool)", "C:\\ForResume\\2.docx", true); QAxObject *words = document->querySubObject("Words"); QString textResult; int countWord = words->dynamicCall("Count()").toInt(); for (int a = 1; a <= countWord; a++){ textResult.append(words->querySubObject("Item(int)", a)->dynamicCall("Text()").toString()); } qDebug()<<textResult;
@Mikeeeeee
That's fine. Are you actually wanting to get the text word-by-word as you have shown? Just saying: it must be horrendously inefficient compared to getting the whole text, but that's OK if that's how you intend. -
J JonB referenced this topic on