Is there a command-line tool for converting html files to pdf?

55,084

Solution 1

pandoc is a great command-line tool for file format conversion.

The disadvantage is for PDF output, you’ll need LaTeX. The usage is

pandoc test.html -t latex -o test.pdf

If you don't have LaTeX installed, then I recommend htmldoc.


Cited from Creating a PDF

By default, pandoc will use LaTeX to create the PDF, which requires that a LaTeX engine be installed.

Alternatively, pandoc can use ConTeXt, pdfroff, or any of the following HTML/CSS-to-PDF-engines, to create a PDF: wkhtmltopdf, weasyprint or prince. To do this, specify an output file with a .pdf extension, as before, but add the --pdf-engine option or -t context, -t html, or -t ms to the command line (-t html defaults to --pdf-engine=wkhtmltopdf).

Solution 2

You can also try wkhtmltopdf, usage and installation is pretty straightforward.

Solution 3

weasyprint is an option. A possible drawback is that you'll need python on your machine.

Install:

pip install weasyprint

Convert:

weasyprint in.html out.pdf

Solution 4

I've been successfully using the 1.8 branch of HTMLDOC for years. I put it in a commercial system that has since generated hundreds of thousands of reports since 2003.

It's not super-versatile, but it is very efficient and reliable. It's limited to a basic set of postscript fonts.

It does not support CSS, but instead uses a special HTML comment directive set to control PDF specific aspects.

The source code is not too difficult to read and edit if you need to add custom facilities, if you're comfortable with C. It is compiled with GCC or Visual Studio, depending on your target platform.

Note that the HTML does not need to be in a file. You can generate it dynamically from a URL, php or aspx etc. You can also hook it up in your web server for generate a PDF file dynamically.

In my use case it generates a PDF file from an asp page which then gets attached to an email, instead of sending the HTML to the printer and the letter stuffing machine; it's a kind of print spooler.

Solution 5

There is also an html2ps program, and you could then easily convert the PostScript file to pdf. I used this several years ago, and IIRC it did a pretty good job on a large manual.

Share:
55,084
EB2127
Author by

EB2127

Updated on September 18, 2022

Comments

  • EB2127
    EB2127 over 1 year

    I would like to install a command line tool within a Docker image in order to quickly convert *html files into *pdf files.

    I am surprised there is not a Unix tool to do something like this.

  • EB2127
    EB2127 almost 5 years
    @cas This is really useful. Could you answer the question with that command? I would like to keep this answer
  • Paradox
    Paradox almost 5 years
    All distributions are shipped with Python .
  • Jeff Schaller
    Jeff Schaller almost 5 years
    @EB2127 Stack Exchange answers can easily contain more than one solution to a problem; collaborative editing can/should make any answer better.
  • shiftas
    shiftas almost 5 years
    Sure, but there are custom linux systems, on embedded devices for example, that might not have python.
  • steveb
    steveb about 4 years
    @cas Unfortunately wkhtmltopdf complains about QXcbConnection: Could not connect to display localhost:12.0 and dumps core. I suspect if I figure out the display issue, then it will work but not sure why it cares about the display.
  • Andrei B
    Andrei B about 4 years
    Indeed, a small and usefull tool, with lots of features. Thank you for sharing!
  • Hashim Aziz
    Hashim Aziz almost 4 years
    What advantage is there to using pandoc with the WeasyPrint engine vs just using WeasyPrint without the dependency on pandoc?
  • Pieter
    Pieter over 2 years
    Tried it, but it ignores # in url. e.g. "status.aws.amazon.com/#AP_Block" converts the wrong tab to pdf
  • Déjà vu
    Déjà vu about 2 years
    It doesn't support CSS3 :(
  • guitarman
    guitarman about 2 years
  • Admin
    Admin about 2 years
    `Try running pandoc with --latex-engine=xelatex. pandoc: Error producing PDF' document contains bangla test also