How to use the browser's (chrome/firefox) HTML/CSS/JS rendering engine to produce PDF?
Solution 1
Firefox has an API method for that: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/tabs/saveAsPDF
browser.tabs.saveAsPDF({})
.then((status) => {
console.log('PDF file status: ' + status);
});
However, it seems to be available only for Browser Extensions, not to be invoked from a web page.
I'm still looking for a public API for that...
Solution 2
I'm not an expert but PhamtomJS seems to be the right tool for the job. I'm not sure though about what headless browser it uses underneath (I guess it is chrome/chromium)
var page = require('webpage').create();
page.open('http://github.com/', function() {
var s = page.evaluate(function() {
var body = document.body,
html = document.documentElement;
var height = Math.max( body.scrollHeight, body.offsetHeight,
html.clientHeight, html.scrollHeight, html.offsetHeight );
var width = Math.max( body.scrollWidth, body.offsetWidth,
html.clientWidth, html.scrollWidth, html.offsetWidth );
return {width: width, height: height}
});
console.log(JSON.stringify(s));
// so it fit ins a single page
page.paperSize = {
width: "1980px",
height: s.height + "px",
margin: {
top: '50px',
left: '20px'
}
};
page.render('github.pdf');
phantom.exit();
});
Hope it helps.
Related videos on Youtube
David Hofmann
15 years of coding since high-school and still enjoying it every single day. Oracle Certified Profesional Java Programmer. Certified Vaadin 8 Developer.
Updated on September 16, 2022Comments
-
David Hofmann over 1 year
There are nice projects that generate pdf from html/css/js files
- http://wkhtmltopdf.org/ (open source)
- https://code.google.com/p/flying-saucer/ (open source)
- http://cssbox.sourceforge.net/ (not necessarily straight pdf generation)
- http://phantomjs.org/ (open source allows for pdf output)
- http://www.princexml.com/ (comercial but hands down the best one out there)
- https://thepdfapi.com/ a chrome modification to spit pdf from html from
I want to programatically control chrome or firefox browser (because they both are cross platform) to make them load a web page, run the scripts and style the page and generate a pdf file for printing.
But how do I start by controlling the browser in an automated way so that I can do something like
render-to-pdf file-to-render.html out.pdf
I can easily make this job manually by browsing the page and then printing it to pdf and I get an accurate, 100% spec compliant rendered html/css/js page on a pdf file. Even the url headers can be omitted in the pdf through configuration options in the browser. But again, how do I start in trying to automate this process?
I want to automate in the server side, the opening of the browser, navigating to a page, and generating the pdf using the browser rendered page.
I have done a lot of research I just don't know how to make the right question. I want to programatically control the browser, maybe like selenium does but to the point where I export a webpage as PDF (hence using the rendering capabilities of the browser to produce good pdfs)
-
Chris Haas over 9 yearsHave you looked at ChromeDriver?
-
Chris Haas over 9 yearsYou might be able to use a combination of the Chromium command line args
--kiosk --kiosk-printing
along with passing the default PDF printer in yourprefs
capability. I've never tried this but that's where I'd start. -
Kevin Brown over 9 yearsI would think you need to do some real research. IMHO a browser was not intended to do this and you have many hurdles to overcome that you have not thought of (things like possibly running headers/footers, keeping content together over page breaks, differing table headers at page breaks, font handling/special character handling and embedding, understanding that browser dimensions are pixels at 96/inch and many other things are not ... I could go on, but that is a start for you.
-
David Hofmann over 9 years@ChrisHaas, $ chrome --kiosk --kiosk-printing file.html, and inside the html I do window.print(); it does excatly what I want, it's just that it still requires me to hit enter to save the file... so sad... Thanks though
-
jamespaden over 8 yearsI think wkhtmltopdf is the closest to what you want. It is a forked version of WebKit built specifically for PDF generation. Alternatively, if you liked Prince, docraptor.com is a commercial saas API powered by the Prince engine.
-
Michael Franzl over 7 years"phantomjs.org (open source allows for pdf rasterization)". Instead of "rasterization" I would have written "output" since the PDFs do contain vectors for vector elements like text, borders, etc.
-
David Hofmann over 9 yearsCSS allows for page sizing when printing. So setting the papersize doesn't in the code example doesn't help. Besides, there are page breaks too in css print. That being said, I see that PhantomJS uses webkit rendering engine, it's not using a supported browser, instead a fork of webkit (which is ok anyway for this task). But it still requires a lot of work to make it work like princexml. I guess now that is the reason they are not cheap