Convert HTML to image
Solution 1
Software Requirements
The following software packages are available for both Windows and Linux systems, and are required for a complete, working solution:
- gvim - Used to export syntax highlighted source code to HTML.
- moria - Colour scheme for syntax highlighting.
- wkhtmltoimage - Used to convert HTML documents to PNG files.
- gawk and sed - Text processing tools.
- ImageMagick - Used to trim the PNG and add a border.
General Steps
Here is how the solution works:
- Load the source code into an editor that can add splashes of colour.
- Export the source code as an HTML document (with embedded
FONT
tags). - Strip the background attribute from the HTML document (to allow transparency).
- Convert the HTML document to a PNG file.
- Trim the PNG border.
- Add a small, 25 pixel border around the image.
- Delete temporary files.
The script generates images that are all the same width for source files containing lines that are all under 80 characters in length. Source files with lines over 80 characters long result in images as wide as necessary to retain the entire line.
Installation
Install the components into the following locations:
-
gvim -
C:\Program Files\Vim
-
moria -
C:\Program Files\Vim\vim73\colors
-
wkhtmltoimage -
C:\Program Files\wkhtml
-
ImageMagick -
C:\Program Files\ImageMagick
-
Gawk and Sed -
C:\Program Files\GnuWin32
Note: ImageMagick has a program called convert.exe
, which cannot supersede the Windows convert
command. Because of this, the full path to convert.exe
must be hard-coded in the batch file (as opposed to adding ImageMagick to the PATH
).
Environment Variables
Set the PATH environment variable to:
"C:\Program Files\Vim\vim73";"C:\Program Files\wkhtml";"C:\Program Files\GnuWin32\bin"
Batch File
Run it using:
src2png.bat src2png.bat
Create a batch file called src2png.bat
by copying the following contents:
@ECHO OFF
SET NUMBERS=-c "set number"
IF "%2" == "" SET NUMBERS=
ECHO Converting %1 to %1.html...
gvim -e %1 -c "set nobackup" %NUMBERS% -c ":colorscheme moria" ^
-c :TOhtml -c wq -c :q
REM Remove all background-color occurrences (without being self-referential)
sed -i "s/background-color: #......; \(.*\)}$/\1 }/g" %1.html
ECHO Converting %1.html to %1.png...
wkhtmltoimage --format png --transparent --minimum-font-size 80 ^
--quality 100 --width 3600 ^
%1.html %1.png
move %1.png %1.orig.png
REM If the text file has lines that exceed 80 characters, don't crop the
REM resulting image. (The book automatically shrinks large images to fit.)
REM The 3950 is the 80 point font at 80 characters with padding for line
REM numbers.
SET LENGTH=0
FOR /F %%l IN ('gawk ^
"BEGIN {x=0} {if( length($0)>x ) x=length()} END {print x;}" %1') ^
DO (
SET LENGTH=%%l
)
SET EXTENT=-extent 3950x
IF %LENGTH% GTR 80 SET EXTENT=
REM Trim the image height, then extend the width for 80 columns, if needed.
REM The result is that all images will be resized the same amount, thus
REM making the font size the same maximum for all source listings. Source
REM files beyond the 80 character limit will be scaled as necessary.
ECHO Trimming %1.png...
"C:\programs\ImageMagick\convert.exe" -format png %1.orig.png ^
-density 150x150 ^
-background none -antialias -trim +repage ^
%EXTENT% ^
-bordercolor none -border 25 ^
%1.png
ECHO Removing old files...
IF EXIST %1.orig.png DEL /q %1.orig.png
IF EXIST %1.html DEL /q %1.html
IF EXIST sed*. DEL /q sed*.
Improvements and optimizations welcome.
Note: The latest version of wkhtmltoimage properly handles overriding the background colour. Thus the line to remove the CSS for background colours is no longer necessary, in theory.
Solution 2
reading the manpage of wkhtmltoimage
:
-d, --dpi <dpi> Change the dpi explicitly
if that does not help: hacking together a simple solution with Qt and (the included) Webkit is pretty straightforward.
Related videos on Youtube
Comments
-
Dave Jarvis almost 2 years
Background
Batch convert various syntax-highlighted source files (C, SQL, Java, PHP, batch, bash) into high-resolution images (600dpi), suitable for an eBook and printed book.
Failed Solutions
A number of attempts so far:
- OpenOffice or LibreOffice - Have to re-import source code into the document every time the source file changes. (That is, the solution cannot be easily automated for hundreds or thousands of source files.)
- enscript. Cannot easily change colours, imperfectly renders output, not comprehensive.
- LyX / LaTeX. Imperfectly renders output.
- gvim to HTML — HTMLDOC to PostScript — GhostScript to PNG. HTMLDOC ignores
font
tags. - gvim to HTML — html2ps — GhostScript to PNG. RGB colours are not recognized by
html2ps
. - Firefox to PostScript — GhostScript to PNG. Obnoxiously circuitous.
- gvim to HTML — OmniFormat to anything. Free version unsuitable for batch processing; lots of advertising pop-ups.
- pygments. Cannot easily change image resolution; does not have gvim's range of colour schemes.
Closest Solution
The solution that almost works is:
- gvim to HTML — wkhtmltopdf to PDF. Will require post-processing with ImageMagick (wkhtmltoimage cannot set image resolution, only page width).
Requirements
- Windows and Linux, but either is acceptable.
- Free or OSS
- Command line only (suitable for batch processing)
- Easily change colour scheme
- Support: PHP, batch, bash, Java, JavaScript, R, C, and SQL
Question
Any other ways to convert syntax-highlighted source code to a high-resolution (600dpi) image?
Thank you!
-
akira over 13 years@Dave Jarvis: why is
wkhtmltoimage
and setting the width of the page not enough? the height can not be specified since it is determined by the content of the html stuff. imho width is all you actually need, you can calculate the needed width based upon how many pixels per inch you want. -
Dave Jarvis over 13 years@akira: The width is dependent on the number of columns the source code uses. Sometimes the width will be 75 characters. Sometimes it will be 40 characters. So 75 characters should take up about 5.5 inches and 40 characters should be slightly more than half that. The 5.5 value depends on the margins of the book, which are subject to change (once or twice). This is a calculation that needs to be done automatically, by the way, otherwise the solution cannot be automated, which defeats the entire purpose.
-
akira over 13 years@Dave Jarvis: yep, i understand your problem. you are lucky with convert that the output of webkit in your case is really scalable and thus you could 'resize' the pdf afterwards. for an integrated solution i suspect one would need some kind of zoom-level AND the width of the 'browser'
-
akira over 13 yearsbtw, what is the document format you are using to create the ebook or the printed book (latex, xsl-fo .. etc?)
-
Dave Jarvis over 13 years@akira: OpenOffice, but possibly LyX (LaTeX) or Scribus later.
-
akira over 13 years@Dave Jarvis: well, we both agree on that having vim yielding something close to the end would be better. maybe one should take the ToHtml code as a base, it does not look too complicated imho. btw, wkhtmltoimage yields .svg as well, maybe you can integrate that more easily into openoffice?
-
akira over 13 yearsagreed, but maybe .svg is worth a try.
-
Dave Jarvis over 13 yearsThat is a documentation error, unfortunately. The
dpi
option is not available with the Windows version. -
RDX over 13 yearsOr you can install Linux as VM (VirtualBox or such) and do the conversion there...