How to extract text from PDF?
Unsolved
General and Desktop
-
Hi everyone,
How can I extract text from a PDF file? 😁
-
Welcome to the forum.
Assuming you want to do this using Qt then there's no out-of-the-box way to achieve this.
How you go about it depends on what you are doing this for, how you want to handle non-text content, how you want to handle layout, what platform you are on, ...
You might get away with something like Ghostscript:
gs -sDEVICE=txtwrite -o output.txt input.pdf
(or a Windows equivalent) -
Hi, on Ubuntu there's pdftotext (a.k.a. poppler-utils).
Also there's a QPdfDocument class which has a getAllText() function. However it looks like you have to compile/build QPdfDocument yourself, i..e it's not included in the Qt installer.