Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to extract text from PDF?
Forum Updated to NodeBB v4.3 + New Features

How to extract text from PDF?

Scheduled Pinned Locked Moved Unsolved General and Desktop
4 Posts 4 Posters 685 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Jerry van de Bunt
    wrote on 31 Oct 2024, 20:07 last edited by
    #1

    Hi everyone,

    How can I extract text from a PDF file? 😁

    1 Reply Last reply
    0
    • C Offline
      C Offline
      ChrisW67
      wrote on 31 Oct 2024, 21:53 last edited by ChrisW67
      #2

      Welcome to the forum.

      Assuming you want to do this using Qt then there's no out-of-the-box way to achieve this.

      How you go about it depends on what you are doing this for, how you want to handle non-text content, how you want to handle layout, what platform you are on, ...

      You might get away with something like Ghostscript:
      gs -sDEVICE=txtwrite -o output.txt input.pdf
      (or a Windows equivalent)

      1 Reply Last reply
      0
      • S Offline
        S Offline
        SGaist
        Lifetime Qt Champion
        wrote on 31 Oct 2024, 22:05 last edited by
        #3

        Hi and welcome to devnet,

        Another option is to convert your pdf to images and use something like tesseract to do OCR on them.

        Interested in AI ? www.idiap.ch
        Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

        1 Reply Last reply
        0
        • H Offline
          H Offline
          hskoglund
          wrote on 1 Nov 2024, 02:46 last edited by hskoglund 11 Jan 2024, 03:22
          #4

          Hi, on Ubuntu there's pdftotext (a.k.a. poppler-utils).

          Also there's a QPdfDocument class which has a getAllText() function. However it looks like you have to compile/build QPdfDocument yourself, i..e it's not included in the Qt installer.

          1 Reply Last reply
          1

          1/4

          31 Oct 2024, 20:07

          • Login

          • Login or register to search.
          1 out of 4
          • First post
            1/4
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • Users
          • Groups
          • Search
          • Get Qt Extensions
          • Unsolved