Convert HTML to image

7,078

Solution 1

Software Requirements

The following software packages are available for both Windows and Linux systems, and are required for a complete, working solution:

  • gvim - Used to export syntax highlighted source code to HTML.
  • moria - Colour scheme for syntax highlighting.
  • wkhtmltoimage - Used to convert HTML documents to PNG files.
  • gawk and sed - Text processing tools.
  • ImageMagick - Used to trim the PNG and add a border.

General Steps

Here is how the solution works:

  1. Load the source code into an editor that can add splashes of colour.
  2. Export the source code as an HTML document (with embedded FONT tags).
  3. Strip the background attribute from the HTML document (to allow transparency).
  4. Convert the HTML document to a PNG file.
  5. Trim the PNG border.
  6. Add a small, 25 pixel border around the image.
  7. Delete temporary files.

The script generates images that are all the same width for source files containing lines that are all under 80 characters in length. Source files with lines over 80 characters long result in images as wide as necessary to retain the entire line.

Installation

Install the components into the following locations:

  • gvim - C:\Program Files\Vim
  • moria - C:\Program Files\Vim\vim73\colors
  • wkhtmltoimage - C:\Program Files\wkhtml
  • ImageMagick - C:\Program Files\ImageMagick
  • Gawk and Sed - C:\Program Files\GnuWin32

Note: ImageMagick has a program called convert.exe, which cannot supersede the Windows convert command. Because of this, the full path to convert.exe must be hard-coded in the batch file (as opposed to adding ImageMagick to the PATH).

Environment Variables

Set the PATH environment variable to:

"C:\Program Files\Vim\vim73";"C:\Program Files\wkhtml";"C:\Program Files\GnuWin32\bin"

Batch File

Run it using:

src2png.bat src2png.bat

Create a batch file called src2png.bat by copying the following contents:

@ECHO OFF

SET NUMBERS=-c "set number"
IF "%2" == "" SET NUMBERS=

ECHO Converting %1 to %1.html...
gvim -e %1 -c "set nobackup" %NUMBERS% -c ":colorscheme moria" ^
  -c :TOhtml -c wq -c :q

REM Remove all background-color occurrences (without being self-referential)
sed -i "s/background-color: #......; \(.*\)}$/\1 }/g" %1.html

ECHO Converting %1.html to %1.png...
wkhtmltoimage --format png --transparent --minimum-font-size 80 ^
  --quality 100 --width 3600 ^
  %1.html %1.png

move %1.png %1.orig.png

REM If the text file has lines that exceed 80 characters, don't crop the
REM resulting image. (The book automatically shrinks large images to fit.)
REM The 3950 is the 80 point font at 80 characters with padding for line
REM numbers.
SET LENGTH=0
FOR /F %%l IN ('gawk ^
  "BEGIN {x=0} {if( length($0)>x ) x=length()} END {print x;}" %1') ^
DO (
  SET LENGTH=%%l
)
SET EXTENT=-extent 3950x
IF %LENGTH% GTR 80 SET EXTENT=

REM Trim the image height, then extend the width for 80 columns, if needed.
REM The result is that all images will be resized the same amount, thus
REM making the font size the same maximum for all source listings. Source
REM files beyond the 80 character limit will be scaled as necessary.
ECHO Trimming %1.png...
"C:\programs\ImageMagick\convert.exe" -format png %1.orig.png ^
  -density 150x150 ^
  -background none -antialias -trim +repage ^
  %EXTENT% ^
  -bordercolor none -border 25 ^
  %1.png

ECHO Removing old files...
IF EXIST %1.orig.png DEL /q %1.orig.png
IF EXIST %1.html DEL /q %1.html
IF EXIST sed*. DEL /q sed*.

Improvements and optimizations welcome.

Note: The latest version of wkhtmltoimage properly handles overriding the background colour. Thus the line to remove the CSS for background colours is no longer necessary, in theory.

Solution 2

reading the manpage of wkhtmltoimage:

 -d,    --dpi   <dpi>   Change the dpi explicitly

if that does not help: hacking together a simple solution with Qt and (the included) Webkit is pretty straightforward.

Share:
7,078

Related videos on Youtube

Dave Jarvis
Author by

Dave Jarvis

https://dave.autonoma.ca/blog/

Updated on September 17, 2022

Comments

  • Dave Jarvis
    Dave Jarvis almost 2 years

    Background

    Batch convert various syntax-highlighted source files (C, SQL, Java, PHP, batch, bash) into high-resolution images (600dpi), suitable for an eBook and printed book.

    Failed Solutions

    A number of attempts so far:

    • OpenOffice or LibreOffice - Have to re-import source code into the document every time the source file changes. (That is, the solution cannot be easily automated for hundreds or thousands of source files.)
    • enscript. Cannot easily change colours, imperfectly renders output, not comprehensive.
    • LyX / LaTeX. Imperfectly renders output.
    • gvim to HTML — HTMLDOC to PostScript — GhostScript to PNG. HTMLDOC ignores font tags.
    • gvim to HTML — html2ps — GhostScript to PNG. RGB colours are not recognized by html2ps.
    • Firefox to PostScript — GhostScript to PNG. Obnoxiously circuitous.
    • gvim to HTML — OmniFormat to anything. Free version unsuitable for batch processing; lots of advertising pop-ups.
    • pygments. Cannot easily change image resolution; does not have gvim's range of colour schemes.

    Closest Solution

    The solution that almost works is:

    • gvim to HTML — wkhtmltopdf to PDF. Will require post-processing with ImageMagick (wkhtmltoimage cannot set image resolution, only page width).

    Requirements

    • Windows and Linux, but either is acceptable.
    • Free or OSS
    • Command line only (suitable for batch processing)
    • Easily change colour scheme
    • Support: PHP, batch, bash, Java, JavaScript, R, C, and SQL

    Question

    Any other ways to convert syntax-highlighted source code to a high-resolution (600dpi) image?

    Thank you!

    • akira
      akira over 13 years
      @Dave Jarvis: why is wkhtmltoimage and setting the width of the page not enough? the height can not be specified since it is determined by the content of the html stuff. imho width is all you actually need, you can calculate the needed width based upon how many pixels per inch you want.
    • Dave Jarvis
      Dave Jarvis over 13 years
      @akira: The width is dependent on the number of columns the source code uses. Sometimes the width will be 75 characters. Sometimes it will be 40 characters. So 75 characters should take up about 5.5 inches and 40 characters should be slightly more than half that. The 5.5 value depends on the margins of the book, which are subject to change (once or twice). This is a calculation that needs to be done automatically, by the way, otherwise the solution cannot be automated, which defeats the entire purpose.
    • akira
      akira over 13 years
      @Dave Jarvis: yep, i understand your problem. you are lucky with convert that the output of webkit in your case is really scalable and thus you could 'resize' the pdf afterwards. for an integrated solution i suspect one would need some kind of zoom-level AND the width of the 'browser'
    • akira
      akira over 13 years
      btw, what is the document format you are using to create the ebook or the printed book (latex, xsl-fo .. etc?)
    • Dave Jarvis
      Dave Jarvis over 13 years
      @akira: OpenOffice, but possibly LyX (LaTeX) or Scribus later.
    • akira
      akira over 13 years
      @Dave Jarvis: well, we both agree on that having vim yielding something close to the end would be better. maybe one should take the ToHtml code as a base, it does not look too complicated imho. btw, wkhtmltoimage yields .svg as well, maybe you can integrate that more easily into openoffice?
    • akira
      akira over 13 years
      agreed, but maybe .svg is worth a try.
  • Dave Jarvis
    Dave Jarvis over 13 years
    That is a documentation error, unfortunately. The dpi option is not available with the Windows version.
  • RDX
    RDX over 13 years
    Or you can install Linux as VM (VirtualBox or such) and do the conversion there...