Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to extract text from PDF?
Forum Updated to NodeBB v4.3 + New Features

How to extract text from PDF?

Scheduled Pinned Locked Moved Unsolved General and Desktop
4 Posts 4 Posters 628 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Jerry van de Bunt
    wrote on last edited by
    #1

    Hi everyone,

    How can I extract text from a PDF file? 😁

    1 Reply Last reply
    0
    • C Offline
      C Offline
      ChrisW67
      wrote on last edited by ChrisW67
      #2

      Welcome to the forum.

      Assuming you want to do this using Qt then there's no out-of-the-box way to achieve this.

      How you go about it depends on what you are doing this for, how you want to handle non-text content, how you want to handle layout, what platform you are on, ...

      You might get away with something like Ghostscript:
      gs -sDEVICE=txtwrite -o output.txt input.pdf
      (or a Windows equivalent)

      1 Reply Last reply
      0
      • SGaistS Offline
        SGaistS Offline
        SGaist
        Lifetime Qt Champion
        wrote on last edited by
        #3

        Hi and welcome to devnet,

        Another option is to convert your pdf to images and use something like tesseract to do OCR on them.

        Interested in AI ? www.idiap.ch
        Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

        1 Reply Last reply
        0
        • hskoglundH Offline
          hskoglundH Offline
          hskoglund
          wrote on last edited by hskoglund
          #4

          Hi, on Ubuntu there's pdftotext (a.k.a. poppler-utils).

          Also there's a QPdfDocument class which has a getAllText() function. However it looks like you have to compile/build QPdfDocument yourself, i..e it's not included in the Qt installer.

          1 Reply Last reply
          1

          • Login

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • Users
          • Groups
          • Search
          • Get Qt Extensions
          • Unsolved