How to use the browser's (chrome/firefox) HTML/CSS/JS rendering engine to produce PDF?

14,334

Solution 1

Firefox has an API method for that: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/tabs/saveAsPDF

browser.tabs.saveAsPDF({})
  .then((status) => {
    console.log('PDF file status: ' + status);
  });

However, it seems to be available only for Browser Extensions, not to be invoked from a web page.

I'm still looking for a public API for that...

Solution 2

I'm not an expert but PhamtomJS seems to be the right tool for the job. I'm not sure though about what headless browser it uses underneath (I guess it is chrome/chromium)

var page = require('webpage').create();
page.open('http://github.com/', function() {
     var s = page.evaluate(function() {
         var body = document.body,
             html = document.documentElement;

        var height = Math.max( body.scrollHeight, body.offsetHeight, 
            html.clientHeight, html.scrollHeight, html.offsetHeight );
        var width = Math.max( body.scrollWidth, body.offsetWidth, 
            html.clientWidth, html.scrollWidth, html.offsetWidth );
        return {width: width, height: height}
    });

    console.log(JSON.stringify(s));

    // so it fit ins a single page
    page.paperSize = {
        width: "1980px",
        height: s.height + "px",
        margin: {
            top: '50px',
            left: '20px'
        }
    };

    page.render('github.pdf');
    phantom.exit();
});

Hope it helps.

Share:
14,334

Related videos on Youtube

David Hofmann
Author by

David Hofmann

15 years of coding since high-school and still enjoying it every single day. Oracle Certified Profesional Java Programmer. Certified Vaadin 8 Developer.

Updated on September 16, 2022

Comments

  • David Hofmann
    David Hofmann over 1 year

    There are nice projects that generate pdf from html/css/js files

    1. http://wkhtmltopdf.org/ (open source)
    2. https://code.google.com/p/flying-saucer/ (open source)
    3. http://cssbox.sourceforge.net/ (not necessarily straight pdf generation)
    4. http://phantomjs.org/ (open source allows for pdf output)
    5. http://www.princexml.com/ (comercial but hands down the best one out there)
    6. https://thepdfapi.com/ a chrome modification to spit pdf from html from

    I want to programatically control chrome or firefox browser (because they both are cross platform) to make them load a web page, run the scripts and style the page and generate a pdf file for printing.

    But how do I start by controlling the browser in an automated way so that I can do something like

    render-to-pdf file-to-render.html out.pdf

    I can easily make this job manually by browsing the page and then printing it to pdf and I get an accurate, 100% spec compliant rendered html/css/js page on a pdf file. Even the url headers can be omitted in the pdf through configuration options in the browser. But again, how do I start in trying to automate this process?

    I want to automate in the server side, the opening of the browser, navigating to a page, and generating the pdf using the browser rendered page.

    I have done a lot of research I just don't know how to make the right question. I want to programatically control the browser, maybe like selenium does but to the point where I export a webpage as PDF (hence using the rendering capabilities of the browser to produce good pdfs)

    • Chris Haas
      Chris Haas over 9 years
      Have you looked at ChromeDriver?
    • Chris Haas
      Chris Haas over 9 years
      You might be able to use a combination of the Chromium command line args --kiosk --kiosk-printing along with passing the default PDF printer in your prefs capability. I've never tried this but that's where I'd start.
    • Kevin Brown
      Kevin Brown over 9 years
      I would think you need to do some real research. IMHO a browser was not intended to do this and you have many hurdles to overcome that you have not thought of (things like possibly running headers/footers, keeping content together over page breaks, differing table headers at page breaks, font handling/special character handling and embedding, understanding that browser dimensions are pixels at 96/inch and many other things are not ... I could go on, but that is a start for you.
    • David Hofmann
      David Hofmann over 9 years
      @ChrisHaas, $ chrome --kiosk --kiosk-printing file.html, and inside the html I do window.print(); it does excatly what I want, it's just that it still requires me to hit enter to save the file... so sad... Thanks though
    • jamespaden
      jamespaden over 8 years
      I think wkhtmltopdf is the closest to what you want. It is a forked version of WebKit built specifically for PDF generation. Alternatively, if you liked Prince, docraptor.com is a commercial saas API powered by the Prince engine.
    • Michael Franzl
      Michael Franzl over 7 years
      "phantomjs.org (open source allows for pdf rasterization)". Instead of "rasterization" I would have written "output" since the PDFs do contain vectors for vector elements like text, borders, etc.
  • David Hofmann
    David Hofmann over 9 years
    CSS allows for page sizing when printing. So setting the papersize doesn't in the code example doesn't help. Besides, there are page breaks too in css print. That being said, I see that PhantomJS uses webkit rendering engine, it's not using a supported browser, instead a fork of webkit (which is ok anyway for this task). But it still requires a lot of work to make it work like princexml. I guess now that is the reason they are not cheap