HTML5 to PDF serverside

14,104

Solution 1

I have used PhantomJS to generate png images from web pages and it can produce PDF as well and the quality is good usually. The property is called screen capture and described here. The supported formats are PNG, JPEG, GIF and PDF.

When converted to PDF, texts of pages retain as texts.

After testing few other libraries or programs, found PhantomJS the most perfect solution. PhantomJS uses WebKit, a real layout and rendering engine.

Few examples are in https://github.com/ariya/phantomjs/wiki/Examples. In the section Rendering/rasterization there is mentioned the following script that helps you in the process:

rasterize.js rasterizes a web page to image or PDF

PhantomJS QuicStart Guide says:

Producing PDF output is possible, e.g. from a Wikipedia article:

phantomjs rasterize.js 'http://en.wikipedia.org/w/index.php?title=Jakarta&printable=yes' jakarta.pdf

or when creating printer-ready cheat sheet:

phantomjs rasterize.js http://www.nihilogic.dk/labs/webgl_cheat_sheet/WebGL_Cheat_Sheet.htm webgl.pdf

I tested pdf-generation of few pages and if page follows standards, it produces good results. Text is selectable and printable as high-quality, but on some pages layout in pdf is not the very same as in png. Below is two screenshots which are generated using commands:

$ phantomjs rasterize.js 'http://windows.microsoft.com/en-US/windows/home' microsoft.png

$ phantomjs rasterize.js 'http://windows.microsoft.com/en-US/windows/home' microsoft.pdf 

Example of png and pdf generation using Phantomjs

I tested also http://lab.simurai.com/buttons/. The pdf and png was very identical and below is a sample of pdf that I rasterized to 5641px wide and cropped a region of it. As in previous PDF example, text is selectable in PDF and as you see, text is sharp (no antialias!).

CSS3Buttons

INSTALLING

I tried first to install Qt library and PhantomJS on Centos5 compiling from source, but no luck. Then on Ubuntu 11.10 and the process was painless:

I downloaded http://phantomjs.googlecode.com/files/phantomjs-1.7.0-linux-x86_64.tar.bz2 and extracted it using

tar -xjvf phantomjs-1.7.0-linux-x86_64.tar.bz2

And then copied phantomjs executable to bin dir of system:

$ cp phantomjs-1.7.0-linux-x86_64/bin/phantomjs /usr/local/bin/phantomjs

and phantomjs was ready to run.

If the generated PDF is not good, you may try to update Webkit, but I suppose that the result should be sufficient. The PhantomJS has excellent update cycle, so bugs should be fixed in reasonable time.

PhantomJS FAQ has also good information of possibilities.

Solution 2

Depending on the complexity of your HTML you could use XmlWorker, which is a project by the iText developers and uses iText.

Share:
14,104
Olivier
Author by

Olivier

Updated on June 23, 2022

Comments

  • Olivier
    Olivier almost 2 years

    I'm looking for a solution for generating a PDF from an HTML5/CSS3 document, serverside.

    I know there is plenty of solution for creating a PDF (like FOP, iText...), but I need to make sure it will look 100% the same than the HTML page. So, I don't want to create a PDF element by element like FOP or iText.

    Actually, something should exists because that's what you do when you print as PDF from your Browser. Ideally, the solution should embed a web browser engine (webkit or gecko). I tried wkHtmlToPdf... but the result is not good at all (the HTML5 canvas is not even printed...)

    If someone have an idea of any solution, free or not, any language... I will appreciate A LOT! Thanks!!

    • Timo Kähkönen
      Timo Kähkönen about 11 years
      Why this still is closed?? Every question causes more "solicit debate, arguments, polling, or extended discussion" than this.
    • HAL 9000
      HAL 9000 over 10 years
      wkhtml2pdf now does render canvas... see wkhtmltopdf.org. Thumbs up for wkhtml2pdf ... it's plain awesome to have just one executable instead of dealing with 7000+ java classes of fop
  • Olivier
    Olivier over 11 years
    Thanks for your answer.I'm using Java on server side, but I'm open to use something else for this PDF generation. Unfortunately, screenshot is not an option, because the pdf generated should be a real PDF for a professional printer (for instance, text should be text, not some pixels).
  • rjmunro
    rjmunro about 10 years
    Converting an image to a PDF is a really bad idea - you will loose all the text, so it won't zoom nicely, and won't be copy/pasteable or searchable. It will also make the PDF file larger than it needs to be. If you use wkHtmlToPdf or phantomJs or a normal browser's print option, the text will go into the PDF as text, and any vector graphics will also go in as vectors, avoiding these problems.