Converting HTML files to PDF

155

Solution 1

The Flying Saucer XHTML renderer project has support for outputting XHTML to PDF. Have a look at an example here.

Solution 2

Did you try WKHTMLTOPDF?

It's a simple shell utility, an open source implementation of WebKit. Both are free.

We've set a small tutorial here

EDIT( 2017 ):

If it was to build something today, I wouldn't go that route anymore.
But would use http://pdfkit.org/ instead.
Probably stripping it of all its nodejs dependencies, to run in the browser.

Solution 3

Check out iText; it is a pure Java PDF toolkit which has support for reading data from HTML. I used it recently in a project when I needed to pull content from our CMS and export as PDF files, and it was all rather straightforward. The support for CSS and style tags is pretty limited, but it does render tables without any problems (I never managed to set column width though).

Creating a PDF from HTML goes something like this:

Document doc = new Document(PageSize.A4);
PdfWriter.getInstance(doc, out);
doc.open();
HTMLWorker hw = new HTMLWorker(doc);
hw.parse(new StringReader(html));
doc.close();

Solution 4

If you have the funding, nothing beats Prince XML as this video shows

Solution 5

Is there maybe a way to grab the rendered page from the internet explorer rendering engine and send it to a PDF-Printer tool automatically?

This is how ActivePDF works, which is good means that you know what you'll get, and it actually has reasonable styling support.

It is also one of the few packages I found (when looking a few years back) that actually supports the various page-break CSS commands.


Unfortunately, the ActivePDF software is very frustrating - since it has to launch the IE browser in the background for conversions it can be quite slow, and it is not particularly stable either.

There is a new version currently in Beta which is supposed to be much better, but I've not actually had a chance to try it out, so don't know how much of an improvement it is.

Share:
155
Michael C
Author by

Michael C

Updated on December 01, 2020

Comments

  • Michael C
    Michael C over 3 years

    I am trying to put a toolbar in the custom table after the below code. When ever i add this there is an issue that resource is not available. Any suggestions are appreciated I have implemented example from https://github.com/bhardwaj-rahul/Copy-ctrl-c-From-Excel-To-Table-SAPUI5/commit/1ef4521dda976ef92b65774beaeca00e2129a5ba which copy paste from excel to table.

    <c:CopyPasteTable id="tableId" items="{/Data}" class="sapUiSizeCompact">
    
    <headerToolbar>
        <OverflowToolbar>
            <Button text=”{i18n>btnTxtPrintCountSheet}” type=”Emphasized” icon=”sap-icon://print” iconFirst=”true” enabled=”true” visible=”true”
                    iconDensityAware=”false” class=”sapUiTinyMargin”/>
            <Button text=” ” type=”Emphasized” icon=”sap-icon://add” iconFirst=”true” width=”auto” enabled=”true” visible=”true” press=”onAddPress”
                    iconDensityAware=”false” class=”sapUiTinyMargin”/>
        </OverflowToolbar>
    </headerToolbar>
    
    • Boghyon Hoffmann
      Boghyon Hoffmann over 3 years
      You should be getting an error that the framework "Cannot add direct child without default aggregation defined for control …”. If that's the issue (which was confirmed by your comment), consider to mark this question as a duplicate of stackoverflow.com/q/59654209/5846045.
  • panschk
    panschk about 15 years
    Thanks for the helpful answer. I don't think ActivePDF is really suitable because of the price, but it's good to know something like that exists.
  • MGOwen
    MGOwen over 14 years
    For a straight html-page-to-pdf conversion, this is better than anything else I've seen, free or commercial.
  • Julie
    Julie over 13 years
    If you're looking for a cheaper alternative for Prince, try DocRaptor.com. It uses Prince as the engine.
  • Eran Medan
    Eran Medan about 13 years
    It's AGPL, seems even worse than GPL, you need to be open source even if you just serve the PDF and iText is server side.
  • Eran Medan
    Eran Medan about 13 years
    Does it work on a non Mac OS?
  • Mic
    Mic about 13 years
    @Eran, we use it on linux. I think there's a windows version too
  • mP.
    mP. about 13 years
    Doesnt sound like a very scalable solution if one needs to convert pages on the fly to pdf in parallel. If a few requests come thru that result in a conversion using FF your server will have lost a few GIG of memory just to serve a few converted pages. This would open your server to a DOS.
  • Nowaker
    Nowaker about 13 years
    @Eran, Just use the last non-AGPL version (com.lowagie:itext:2.1.7 in Maven).
  • Viccari
    Viccari about 12 years
    @Mic Yes, there is a Windows version too.
  • David Hofmann
    David Hofmann over 11 years
    The real problem with flying sauser is that it uses itext to render PDF, which is a AGPL v3 licenced lib
  • Gary - Stand with Ukraine
    Gary - Stand with Ukraine about 11 years
    The version of itext used by Flying Saucer is 2.0.8 which was available under LGPL. Only version numbers 5 or above are on the more restrictive license. stackoverflow.com/questions/2692000/…
  • user1914292
    user1914292 about 11 years
    And if you want to cheaper, but with more options, try htm2pdf.co.uk - it uses webkit and users real WYSIWIG
  • Lucas Meijer
    Lucas Meijer almost 11 years
    tested on windows XP (version 0.9.9) and works very well. Also, does not require admin rights on the machine to install.
  • SteveT
    SteveT almost 11 years
    I'd say the real problem with Flying Saucer is that it requires a well-formed and valid XML document. It's easy to unwittingly break the PDF rendering by including something like an ampersand in your HTML, or some javascript code that makes your rendered HTML not strict XHTML. Though this can be mitigated with automated tests or some process that involves XML validation.
  • nafg
    nafg over 10 years
    Better but similar: github.com/ariya/phantomjs/wiki/Screen-Capture (according to we-love-php.blogspot.com/2012/12/… the pdf has real text, not rasterized)
  • Pino
    Pino over 10 years
    HTMLWorker is deprecated in newer versions of IText in favor of XMLWorker; however CSS support is poor in both cases (see demo.itextsupport.com/xmlworker/itextdoc/…) and was not adequate for my needs. On the contrary Flying Saucer was perfect.
  • David Hofmann
    David Hofmann over 9 years
    why can't we use the real browser for that instead of the fork of the (now unmantained) rendering engine ? See stackoverflow.com/q/25574082/39998
  • Mic
    Mic over 9 years
    @DavidHofmann, probably because this question dates back to 2009. From the last check I did few months ago, there was still no comparable solution in JS
  • IcedDante
    IcedDante over 9 years
    How would this work in a threaded Enterprise environment that would be generating several hundred pdf files a minute?
  • Mic
    Mic over 9 years
    @IcedDante, what makes you think there would be a problem?
  • IcedDante
    IcedDante over 9 years
    I guess what I am wondering is if this shell utility creates its own memory space for each invocation or if it operates like a utility in headless mode where each thread would be using a shared resource
  • Mic
    Mic over 9 years
    @IcedDante, we have a similar load of pdf as yours, but we queue them in a background job, to preserve server performances. And run them one by one. However if I remember well, in the beginning we made some tests, and there was no collision on concurrent calls.
  • Jossef Harush Kadouri
    Jossef Harush Kadouri over 8 years
    i love you for this reference. great utility
  • Gray
    Gray about 8 years
    How is this a Java solution? This is a windows print driver.
  • PhiLho
    PhiLho about 8 years
    The OP explicitly mentioned Windows. And I suppose there are similar drivers for other systems. The OP only mentioned Java as a possible solution...
  • Vova Rozhkov
    Vova Rozhkov over 7 years
    You may use LGPL version which could be found at github.com/albfernandez/itext2
  • user1474090
    user1474090 over 7 years
    GrabzIt's HTML to PDF API: grabz.it/html-to-pdf-image-api.aspx Works in the same way it renders the HTML in a browser and then creates the PDF this ensures that there is much more accurate PDF conversions.
  • Cardinal System
    Cardinal System over 5 years
    It's JavaScript, not Java....
  • Mic
    Mic over 5 years
    @CardinalSystem it's neither JS nor Java, just a command line tool over the library of WKHTMLTOPDF written in c
  • kommradHomer
    kommradHomer over 4 years
    For many simple cases , I still do recommend using a wkhtmltopdf binary
  • Kenny Cason
    Kenny Cason over 3 years
    Can confirm wkhtmltopdf is a great tool, and easy to use. I've been using it for years and still use it frequently.
  • Emmanuel Bourg
    Emmanuel Bourg almost 3 years
    HTMLWorker supports very simple HTML documents, with basic elements and no CSS. It is too limited to be useful. But the more recent iText html2pdf works really great kb.itextpdf.com/home/it7kb/ebooks/…
  • Daniel
    Daniel almost 3 years
    From Java, you can use github.com/wooio/htmltopdf-java which is a wrapper around wkhtmltopdf
  • ayan ahmedov
    ayan ahmedov about 2 years
    @Danielany may I ask, if you have any experience using it in a web server environment? I mean I think, it won't play nicely with a web server spawning new process for each client request.
  • Mic
    Mic about 2 years
    @ayanahmedov, yes we do that for about 13 years now, on an Ubuntu server with nginx