Generating PDF from HTML, fast!
-
Qt 5.7, Linux & Windows. I have in-memory HTML string. I need to convert to PDF, and save to file.
Up until now I have been using
QWebEngineView
to achieve this. That involves having to wait to allow it to finish loading & rendering the HTML, and thenQWebEnginePage::printToPdf()
to export to PDF.This is fine where I am allowing the user to preview the HTML and generate the PDF interactively.
However, the software also has to export hundreds of HTML documents to PDF unattended. Although in that situation I cut out actually displaying the
QWebEngineView
and just use it non-interactively to generate the PDF, it's way too slow. (The HTML load/render actually takes longer than the print to PDF.) Trust me!So I have two questions:
-
What do you/should I use for HTML->PDF within Qt? I see there is
QPdfWriter
(http://doc.qt.io/qt-5/qpdfwriter.html) or there isQTextDocument::print()
(http://doc.qt.io/qt-5/qtextdocument.html#print) and send to PDF-printer-to-file. Which one to use? Have I missed another one? -
My users are "an*lly retentive" about the exact format of their output. I already had problems when I moved them from
QWebKit
with its PDF-generation-engine over toQWebEngine
with its different one. I'm a bit unsure about this: where are the PDF-generation-engines? I believeQWebEngine
is using a Chromium one (or at least its own), if I useQPdfWriter
and/orQTextDocument::print(QPagedPaintDevice *printer)
will I be using the same PDF engine or a different one fromQWebEngine
?
-
-
@JonB said in Generating PDF from HTML, fast!:
f I use QPdfWriter and/or QTextDocument::print(QPagedPaintDevice *printer) will I be using the same PDF engine or a different one from QWebEngine?
Yes, they are different and
QTextDocument
supports only a subset of html.It really depends on the document, it's pretty easy to try using
QTextDocument
setting the html and then print to pdf. if the result is acceptable for the client you might be done -
@VRonin
Thank you for answering.Yes, they are different and QTextDocument supports only a subset of html.
That's a major blow. I have no real idea what the HTML might contain, all I know so far as that whatever
QWebEngineView
makes of it is acceptable to the user, if that could differ if I go viaQTextDocument
that may be a non-no :(What about the other part of the question? Where is the "HTML-to-PDF-converter-driver"? Out of
QWebEnginePage::printToPdf()
,QTextDocument::print(QPagedPaintDevice *printer)
(where printer is PDF) andQPdfWriter
, do they share the same code to produce the PDF or are they each quite separate with their own code for that? Then I would understand what choices I have. If you had a completely arbitrary piece of HTML, which one would you use to get to PDF? Thanks![P.S. I'm removing my naughty post elsewhere which encouraged a bad example to get you here...!]
-
@JonB said in Generating PDF from HTML, fast!:
QWebEnginePage::printToPdf(), QTextDocument::print(QPagedPaintDevice *printer) (where printer is PDF) and QPdfWriter
the latter 2 share the same code. QWebEnginePage uses the Chromium engine to do the work
-
@VRonin
Thank you, that is great information, and about what I suspected.For my application that probably means I shall have to stick with
QWebEnginePage::printToPdf()
for "unattended" conversion, as the user can also go into an "interactive" session which does show the page there and convert to PDF, and I suspect users will demand 100% compatibility with that one's output. So now I have to investigate whether I can get that to be much faster when it does not actually need to display the HTML to the user but just convert it to PDF.... :( -
Instead of creating and deleting
QDialogs
andQWebEnginePages
over and over again to render the HTML offscreen, wonder if it's faster (and memory efficient!) to use a dedicated HTML-to-PDF converter:- https://wkhtmltopdf.org/ (Console app -- use this to quickly test if the library produces the output you want)
- https://wkhtmltopdf.org/libwkhtmltox/ (C library)
- https://github.com/mreiferson/py-wkhtmltox (Python bindings)
-
@JKSH
I am aware of this. The issue is: that dialog is also used "interactively" to allow the user to "preview" the letter, optionally edit it, and produce the PDF. Then it can be used to "batch" process hundreds of letters, non-interactively. It is vital to the users that the batch-processed outputs be identical to the interactive ones, down to the pixel. So that's why I have to use the same engine/mechanism for non-interactive as interactive, which precludes using something else. -
@JonB said in Generating PDF from HTML, fast!:
down to the pixel.
Stringent requirements indeed!
All the best