HTML5 to PDF serverside
Solution 1
I have used PhantomJS to generate png images from web pages and it can produce PDF as well and the quality is good usually. The property is called screen capture and described here. The supported formats are PNG, JPEG, GIF and PDF.
When converted to PDF, texts of pages retain as texts.
After testing few other libraries or programs, found PhantomJS the most perfect solution. PhantomJS uses WebKit, a real layout and rendering engine.
Few examples are in https://github.com/ariya/phantomjs/wiki/Examples. In the section Rendering/rasterization there is mentioned the following script that helps you in the process:
rasterize.js rasterizes a web page to image or PDF
PhantomJS QuicStart Guide says:
Producing PDF output is possible, e.g. from a Wikipedia article:
phantomjs rasterize.js 'http://en.wikipedia.org/w/index.php?title=Jakarta&printable=yes' jakarta.pdf
or when creating printer-ready cheat sheet:
phantomjs rasterize.js http://www.nihilogic.dk/labs/webgl_cheat_sheet/WebGL_Cheat_Sheet.htm webgl.pdf
I tested pdf-generation of few pages and if page follows standards, it produces good results. Text is selectable and printable as high-quality, but on some pages layout in pdf is not the very same as in png. Below is two screenshots which are generated using commands:
$ phantomjs rasterize.js 'http://windows.microsoft.com/en-US/windows/home' microsoft.png
$ phantomjs rasterize.js 'http://windows.microsoft.com/en-US/windows/home' microsoft.pdf
I tested also http://lab.simurai.com/buttons/. The pdf and png was very identical and below is a sample of pdf that I rasterized to 5641px wide and cropped a region of it. As in previous PDF example, text is selectable in PDF and as you see, text is sharp (no antialias!).
INSTALLING
I tried first to install Qt library and PhantomJS on Centos5 compiling from source, but no luck. Then on Ubuntu 11.10 and the process was painless:
I downloaded http://phantomjs.googlecode.com/files/phantomjs-1.7.0-linux-x86_64.tar.bz2 and extracted it using
tar -xjvf phantomjs-1.7.0-linux-x86_64.tar.bz2
And then copied phantomjs executable to bin dir of system:
$ cp phantomjs-1.7.0-linux-x86_64/bin/phantomjs /usr/local/bin/phantomjs
and phantomjs was ready to run.
If the generated PDF is not good, you may try to update Webkit, but I suppose that the result should be sufficient. The PhantomJS has excellent update cycle, so bugs should be fixed in reasonable time.
PhantomJS FAQ has also good information of possibilities.
Solution 2
Depending on the complexity of your HTML you could use XmlWorker, which is a project by the iText developers and uses iText.
Olivier
Updated on June 23, 2022Comments
-
Olivier almost 2 years
I'm looking for a solution for generating a PDF from an HTML5/CSS3 document, serverside.
I know there is plenty of solution for creating a PDF (like FOP, iText...), but I need to make sure it will look 100% the same than the HTML page. So, I don't want to create a PDF element by element like FOP or iText.
Actually, something should exists because that's what you do when you print as PDF from your Browser. Ideally, the solution should embed a web browser engine (webkit or gecko). I tried wkHtmlToPdf... but the result is not good at all (the HTML5 canvas is not even printed...)
If someone have an idea of any solution, free or not, any language... I will appreciate A LOT! Thanks!!
-
Timo Kähkönen about 11 yearsWhy this still is closed?? Every question causes more "solicit debate, arguments, polling, or extended discussion" than this.
-
HAL 9000 over 10 yearswkhtml2pdf now does render canvas... see wkhtmltopdf.org. Thumbs up for wkhtml2pdf ... it's plain awesome to have just one executable instead of dealing with 7000+ java classes of fop
-
-
Olivier over 11 yearsThanks for your answer.I'm using Java on server side, but I'm open to use something else for this PDF generation. Unfortunately, screenshot is not an option, because the pdf generated should be a real PDF for a professional printer (for instance, text should be text, not some pixels).
-
rjmunro about 10 yearsConverting an image to a PDF is a really bad idea - you will loose all the text, so it won't zoom nicely, and won't be copy/pasteable or searchable. It will also make the PDF file larger than it needs to be. If you use wkHtmlToPdf or phantomJs or a normal browser's print option, the text will go into the PDF as text, and any vector graphics will also go in as vectors, avoiding these problems.