Read doc and docx files
-
wrote on 24 Oct 2018, 11:45 last edited by aha_1980
Hi!
How can I read doc and docx files? -
wrote on 24 Oct 2018, 13:29 last edited by
In this code txt file is normaly read, but docx return "PK\u0003\u0004\u0014" , then text in file "Test Text for docx "
#include "mainwindow.h" #include <QApplication> #include <QFile> //для работы с файлами #include <QtConcurrent/QtConcurrent> //for qDebug int main(int argc, char *argv[]) { QApplication a(argc, argv); MainWindow w; QFile TrainingFile("C:\\ForResume\\2.docx"); //QFile TrainingFile("C:\\ForResume\\1.txt"); QString Text; if (TrainingFile.open(QIODevice::ReadOnly)) { Text = TrainingFile.readAll(); } qDebug()<<TrainingFile; qDebug()<<Text; w.show(); return a.exec(); }
-
In this code txt file is normaly read, but docx return "PK\u0003\u0004\u0014" , then text in file "Test Text for docx "
#include "mainwindow.h" #include <QApplication> #include <QFile> //для работы с файлами #include <QtConcurrent/QtConcurrent> //for qDebug int main(int argc, char *argv[]) { QApplication a(argc, argv); MainWindow w; QFile TrainingFile("C:\\ForResume\\2.docx"); //QFile TrainingFile("C:\\ForResume\\1.txt"); QString Text; if (TrainingFile.open(QIODevice::ReadOnly)) { Text = TrainingFile.readAll(); } qDebug()<<TrainingFile; qDebug()<<Text; w.show(); return a.exec(); }
wrote on 24 Oct 2018, 14:50 last edited by@Mikeeeeee said in Read doc and docx fils:
In this code txt file is normaly read, but docx return "PK\u0003\u0004\u0014"
PK is the signature of a zip file.
Seems pretty normal since docx files are zip files ;) -
wrote on 24 Oct 2018, 15:04 last edited by VRonin
It's not that trivial.
The documentation that specifies the MS Excel formats is ISO/IEC 29500-# (# being 1, 2, 3 and 4) which is ~7000 pages of standard.
A library that handles that format is a monumental project.Your best bet is touse
QAxObject
to interact with MS word and use its engine to read/write the file -
wrote on 24 Oct 2018, 16:03 last edited by
-
wrote on 24 Oct 2018, 16:14 last edited by
Summoning the help of the expert @hskoglund
-
wrote on 24 Oct 2018, 16:34 last edited by
@Mikeeeeee said in Read doc and docx fils:
@VRonin said in Read doc and docx fils:
QAxObject
How can I use QAxObject?
Did you start by looking at post https://forum.qt.io/topic/74254/how-to-read-and-write-docx-files-in-qt/26 in @jsulm's link?
-
wrote on 25 Oct 2018, 09:00 last edited by
I do not understand how in this example to get the text from the file.
-
wrote on 25 Oct 2018, 09:23 last edited by
@Mikeeeeee
It's an example in response to yourHow can I use QAxObject?
illustrating how you can use that to talk to VBA in Word. Which is what you asked for. It does not show how to retrieve the document contents. That is for you to look up, to discover whatever the necessary VBA is to do that. I am not going to do that for you. I cannot say whether someone else is prepared to do the work and write the code for you.
-
wrote on 25 Oct 2018, 12:27 last edited by
M ore information here
This code worksQAxObject wordApplication("Word.Application"); QAxObject *documents = wordApplication.querySubObject("Documents"); QAxObject *document = documents->querySubObject("Open(const QString&, bool)", "C:\\ForResume\\2.docx", true); QAxObject *words = document->querySubObject("Words"); QString textResult; int countWord = words->dynamicCall("Count()").toInt(); for (int a = 1; a <= countWord; a++){ textResult.append(words->querySubObject("Item(int)", a)->dynamicCall("Text()").toString()); } qDebug()<<textResult;
-
M ore information here
This code worksQAxObject wordApplication("Word.Application"); QAxObject *documents = wordApplication.querySubObject("Documents"); QAxObject *document = documents->querySubObject("Open(const QString&, bool)", "C:\\ForResume\\2.docx", true); QAxObject *words = document->querySubObject("Words"); QString textResult; int countWord = words->dynamicCall("Count()").toInt(); for (int a = 1; a <= countWord; a++){ textResult.append(words->querySubObject("Item(int)", a)->dynamicCall("Text()").toString()); } qDebug()<<textResult;
wrote on 25 Oct 2018, 12:42 last edited by@Mikeeeeee
That's fine. Are you actually wanting to get the text word-by-word as you have shown? Just saying: it must be horrendously inefficient compared to getting the whole text, but that's OK if that's how you intend. -
7/12