Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. Generating PDF from HTML, fast!
Forum Updated to NodeBB v4.3 + New Features

Generating PDF from HTML, fast!

Scheduled Pinned Locked Moved Solved General and Desktop
8 Posts 3 Posters 7.7k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Online
    J Online
    JonB
    wrote on 5 Sept 2018, 13:07 last edited by JonB 9 May 2018, 13:10
    #1

    Qt 5.7, Linux & Windows. I have in-memory HTML string. I need to convert to PDF, and save to file.

    Up until now I have been using QWebEngineView to achieve this. That involves having to wait to allow it to finish loading & rendering the HTML, and then QWebEnginePage::printToPdf() to export to PDF.

    This is fine where I am allowing the user to preview the HTML and generate the PDF interactively.

    However, the software also has to export hundreds of HTML documents to PDF unattended. Although in that situation I cut out actually displaying the QWebEngineView and just use it non-interactively to generate the PDF, it's way too slow. (The HTML load/render actually takes longer than the print to PDF.) Trust me!

    So I have two questions:

    1. What do you/should I use for HTML->PDF within Qt? I see there is QPdfWriter (http://doc.qt.io/qt-5/qpdfwriter.html) or there is QTextDocument::print() (http://doc.qt.io/qt-5/qtextdocument.html#print) and send to PDF-printer-to-file. Which one to use? Have I missed another one?

    2. My users are "an*lly retentive" about the exact format of their output. I already had problems when I moved them from QWebKit with its PDF-generation-engine over to QWebEngine with its different one. I'm a bit unsure about this: where are the PDF-generation-engines? I believe QWebEngine is using a Chromium one (or at least its own), if I use QPdfWriter and/or QTextDocument::print(QPagedPaintDevice *printer) will I be using the same PDF engine or a different one from QWebEngine?

    V 1 Reply Last reply 6 Sept 2018, 08:16
    0
    • J JonB
      5 Sept 2018, 13:07

      Qt 5.7, Linux & Windows. I have in-memory HTML string. I need to convert to PDF, and save to file.

      Up until now I have been using QWebEngineView to achieve this. That involves having to wait to allow it to finish loading & rendering the HTML, and then QWebEnginePage::printToPdf() to export to PDF.

      This is fine where I am allowing the user to preview the HTML and generate the PDF interactively.

      However, the software also has to export hundreds of HTML documents to PDF unattended. Although in that situation I cut out actually displaying the QWebEngineView and just use it non-interactively to generate the PDF, it's way too slow. (The HTML load/render actually takes longer than the print to PDF.) Trust me!

      So I have two questions:

      1. What do you/should I use for HTML->PDF within Qt? I see there is QPdfWriter (http://doc.qt.io/qt-5/qpdfwriter.html) or there is QTextDocument::print() (http://doc.qt.io/qt-5/qtextdocument.html#print) and send to PDF-printer-to-file. Which one to use? Have I missed another one?

      2. My users are "an*lly retentive" about the exact format of their output. I already had problems when I moved them from QWebKit with its PDF-generation-engine over to QWebEngine with its different one. I'm a bit unsure about this: where are the PDF-generation-engines? I believe QWebEngine is using a Chromium one (or at least its own), if I use QPdfWriter and/or QTextDocument::print(QPagedPaintDevice *printer) will I be using the same PDF engine or a different one from QWebEngine?

      V Offline
      V Offline
      VRonin
      wrote on 6 Sept 2018, 08:16 last edited by
      #2

      @JonB said in Generating PDF from HTML, fast!:

      f I use QPdfWriter and/or QTextDocument::print(QPagedPaintDevice *printer) will I be using the same PDF engine or a different one from QWebEngine?

      Yes, they are different and QTextDocument supports only a subset of html.

      It really depends on the document, it's pretty easy to try using QTextDocument setting the html and then print to pdf. if the result is acceptable for the client you might be done

      "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
      ~Napoleon Bonaparte

      On a crusade to banish setIndexWidget() from the holy land of Qt

      J 1 Reply Last reply 6 Sept 2018, 08:22
      0
      • V VRonin
        6 Sept 2018, 08:16

        @JonB said in Generating PDF from HTML, fast!:

        f I use QPdfWriter and/or QTextDocument::print(QPagedPaintDevice *printer) will I be using the same PDF engine or a different one from QWebEngine?

        Yes, they are different and QTextDocument supports only a subset of html.

        It really depends on the document, it's pretty easy to try using QTextDocument setting the html and then print to pdf. if the result is acceptable for the client you might be done

        J Online
        J Online
        JonB
        wrote on 6 Sept 2018, 08:22 last edited by JonB 9 Jun 2018, 08:26
        #3

        @VRonin
        Thank you for answering.

        Yes, they are different and QTextDocument supports only a subset of html.

        That's a major blow. I have no real idea what the HTML might contain, all I know so far as that whatever QWebEngineView makes of it is acceptable to the user, if that could differ if I go via QTextDocument that may be a non-no :(

        What about the other part of the question? Where is the "HTML-to-PDF-converter-driver"? Out of QWebEnginePage::printToPdf(), QTextDocument::print(QPagedPaintDevice *printer) (where printer is PDF) and QPdfWriter, do they share the same code to produce the PDF or are they each quite separate with their own code for that? Then I would understand what choices I have. If you had a completely arbitrary piece of HTML, which one would you use to get to PDF? Thanks!

        [P.S. I'm removing my naughty post elsewhere which encouraged a bad example to get you here...!]

        V 1 Reply Last reply 6 Sept 2018, 08:33
        0
        • J JonB
          6 Sept 2018, 08:22

          @VRonin
          Thank you for answering.

          Yes, they are different and QTextDocument supports only a subset of html.

          That's a major blow. I have no real idea what the HTML might contain, all I know so far as that whatever QWebEngineView makes of it is acceptable to the user, if that could differ if I go via QTextDocument that may be a non-no :(

          What about the other part of the question? Where is the "HTML-to-PDF-converter-driver"? Out of QWebEnginePage::printToPdf(), QTextDocument::print(QPagedPaintDevice *printer) (where printer is PDF) and QPdfWriter, do they share the same code to produce the PDF or are they each quite separate with their own code for that? Then I would understand what choices I have. If you had a completely arbitrary piece of HTML, which one would you use to get to PDF? Thanks!

          [P.S. I'm removing my naughty post elsewhere which encouraged a bad example to get you here...!]

          V Offline
          V Offline
          VRonin
          wrote on 6 Sept 2018, 08:33 last edited by
          #4

          @JonB said in Generating PDF from HTML, fast!:

          QWebEnginePage::printToPdf(), QTextDocument::print(QPagedPaintDevice *printer) (where printer is PDF) and QPdfWriter

          the latter 2 share the same code. QWebEnginePage uses the Chromium engine to do the work

          "La mort n'est rien, mais vivre vaincu et sans gloire, c'est mourir tous les jours"
          ~Napoleon Bonaparte

          On a crusade to banish setIndexWidget() from the holy land of Qt

          J 1 Reply Last reply 6 Sept 2018, 08:38
          3
          • V VRonin
            6 Sept 2018, 08:33

            @JonB said in Generating PDF from HTML, fast!:

            QWebEnginePage::printToPdf(), QTextDocument::print(QPagedPaintDevice *printer) (where printer is PDF) and QPdfWriter

            the latter 2 share the same code. QWebEnginePage uses the Chromium engine to do the work

            J Online
            J Online
            JonB
            wrote on 6 Sept 2018, 08:38 last edited by
            #5

            @VRonin
            Thank you, that is great information, and about what I suspected.

            For my application that probably means I shall have to stick with QWebEnginePage::printToPdf() for "unattended" conversion, as the user can also go into an "interactive" session which does show the page there and convert to PDF, and I suspect users will demand 100% compatibility with that one's output. So now I have to investigate whether I can get that to be much faster when it does not actually need to display the HTML to the user but just convert it to PDF.... :(

            1 Reply Last reply
            0
            • J Offline
              J Offline
              JKSH
              Moderators
              wrote on 7 Sept 2018, 02:07 last edited by JKSH 9 Jul 2018, 14:04
              #6

              Instead of creating and deleting QDialogs and QWebEnginePages over and over again to render the HTML offscreen, wonder if it's faster (and memory efficient!) to use a dedicated HTML-to-PDF converter:

              • https://wkhtmltopdf.org/ (Console app -- use this to quickly test if the library produces the output you want)
              • https://wkhtmltopdf.org/libwkhtmltox/ (C library)
              • https://github.com/mreiferson/py-wkhtmltox (Python bindings)

              Qt Doc Search for browsers: forum.qt.io/topic/35616/web-browser-extension-for-improved-doc-searches

              J 1 Reply Last reply 7 Sept 2018, 08:14
              2
              • J JKSH
                7 Sept 2018, 02:07

                Instead of creating and deleting QDialogs and QWebEnginePages over and over again to render the HTML offscreen, wonder if it's faster (and memory efficient!) to use a dedicated HTML-to-PDF converter:

                • https://wkhtmltopdf.org/ (Console app -- use this to quickly test if the library produces the output you want)
                • https://wkhtmltopdf.org/libwkhtmltox/ (C library)
                • https://github.com/mreiferson/py-wkhtmltox (Python bindings)
                J Online
                J Online
                JonB
                wrote on 7 Sept 2018, 08:14 last edited by
                #7

                @JKSH
                I am aware of this. The issue is: that dialog is also used "interactively" to allow the user to "preview" the letter, optionally edit it, and produce the PDF. Then it can be used to "batch" process hundreds of letters, non-interactively. It is vital to the users that the batch-processed outputs be identical to the interactive ones, down to the pixel. So that's why I have to use the same engine/mechanism for non-interactive as interactive, which precludes using something else.

                J 1 Reply Last reply 7 Sept 2018, 14:07
                0
                • J JonB
                  7 Sept 2018, 08:14

                  @JKSH
                  I am aware of this. The issue is: that dialog is also used "interactively" to allow the user to "preview" the letter, optionally edit it, and produce the PDF. Then it can be used to "batch" process hundreds of letters, non-interactively. It is vital to the users that the batch-processed outputs be identical to the interactive ones, down to the pixel. So that's why I have to use the same engine/mechanism for non-interactive as interactive, which precludes using something else.

                  J Offline
                  J Offline
                  JKSH
                  Moderators
                  wrote on 7 Sept 2018, 14:07 last edited by
                  #8

                  @JonB said in Generating PDF from HTML, fast!:

                  down to the pixel.

                  Stringent requirements indeed!

                  All the best

                  Qt Doc Search for browsers: forum.qt.io/topic/35616/web-browser-extension-for-improved-doc-searches

                  1 Reply Last reply
                  0

                  2/8

                  6 Sept 2018, 08:16

                  topic:navigator.unread, 6
                  • Login

                  • Login or register to search.
                  2 out of 8
                  • First post
                    2/8
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • Users
                  • Groups
                  • Search
                  • Get Qt Extensions
                  • Unsolved