Fast pdf to jpg conversion on Linux wanted

18,051

Solution 1

Using Ghostscript directly (instead of using ImageMagick's convert command, which calls Ghostscript indirectly) is indeed faster. And it gives you more control about conversion parameters. Try

gs \
   -sDEVICE=jpeg   \
   -o bar_%03d.jpg \
   -dJPEGQ=95      \
   -r600x600       \
   -g4960x7016     \
   foo.pdf

where

  • -o: determines output path+filename (and saves usage of -dBATCH -dNOPAUSE)
  • -dJPEGQ: sets JPEG quality to 95%
  • -r: sets resolution to 600dpi
  • -g: sets image size to 4960x7016px
  • -sDEVICE: sets output as JPEG

This command will probably be still to slow for you and create files bigger than expected. For smaller filesizes and faster execution try this (which probably comes close to output quality of your convert commandline):

gs \
   -sDEVICE=jpeg   \
   -o bar_%03d_200dpi_q80.jpg \
   -dJPEGQ=80      \
   -r200x200       \
   -g1653x2339     \
   foo.pdf

or even

gs \
   -sDEVICE=jpeg   \
   -o bar_%03d_default_a4.jpg \
   -sPAPERSIZE=a4 \
   foo.pdf

(which gives 72dpi resolution, often good enough for most screens and for most web applications).

Solution 2

BTW, one of the reasons ImageMagick is so much slower is that it calls Ghostscript twice. It does not convert PDF => PNG in one go, but uses 2 different steps:

  • it first uses Ghostscript for PDF => PostScript conversion;
  • it then uses Ghostscript for PostScript => PNG conversion.

You can learn about the detailed settings ImageMagick's "delegates" (the external programs ImageMagick uses, such as Ghostscript) by typing

convert -list delegate

(On my system that's a list of 32 different commands.) Now to see which commands are used to convert to PNG, use this:

convert -list delegate | grep -i png

Ok, this was for Linux. If you are on Windows, try this:

convert -list delegate | findstr /i png

You'll discover that IM does produce PNG only from PS or EPS input. So how does IM get (E)PS from your PDF? Easy:

convert -list delegate | findstr /i PDF
convert -list delegate | grep -i PDF

Ah! It uses Ghostscript to make a PDF => PS conversion, then uses Ghostscript again to make a PS => PNG conversion. Works, but isn't the most efficient way if you know that Ghostscript can do PDF => PNG in one go. And faster. And in much better quality.

About IM's handling of PDF conversion to images via the Ghostscript delegate you should know two things first and foremost:

  1. By default, if you don't give an extra parameter, Ghostscript will output images with a 72dpi resolution. That's why sometimes people here suggest to add -density 600 as a convert parameter which tells Ghostscript to use a 600 dpi resolution for its image output.
  2. The detour of IM to call Ghostscript twice to convert first PDF => PS and then PS => PNG is a real blunder. Because you never win and harldy keep quality in the first step, but very often loose some. Reasons:
    • PDF can handle transparencies, which PostScript can not.
    • PDF can embed TrueType fonts, which PostScript can not. etc.pp.
      (Conversion in the opposite direction, PS => PDF, therefor is not that critical....)

That's why I'd suggested you convert your PDFs in one go to PNG (or JPEG) using Ghostscript directly. And use the most recent version 8.71 (soon to be released: 9.00) of Ghostscript...

Solution 3

In my experience, MuPDF is a lot faster than Ghostscript. It is a much newer project without much of the cruft in gs. Try if it fits for your usecase!

mudraw -w 1024 -h 768 -r 200 -c rgb -o bar%d.png foo.pdf

If you have a older linux distribution and installed mupdf-tools from the repository, mudraw might still be called pdfdraw

You then have to convert the png to jpeg using for example imagemagick. But it will still be faster than Ghostscript.

Solution 4

The program pdftoppm from the poppler package is also able to create JPEGs, and for me it is about twice as fast as using gs as described above:

pdftoppm -jpeg -r 300 foo.pdf foo.jpg
Share:
18,051

Related videos on Youtube

mat3001
Author by

mat3001

Updated on September 17, 2022

Comments

  • mat3001
    mat3001 over 1 year

    I am currently using ImageMagick to convert PDFs to JPEG raster images. It is painfully slow and uses up a lot of memory.

    The command I used was:

    convert -geometry 1024x768 -density 200 -colorspace RGB foo.pdf bar%02d.jpg
    

    I guess that it's slow because it uses Ghostscript. But there must be a faster way to do that on a Linux box.

    Has anybody found a better solution?

    • Zoredache
      Zoredache almost 14 years
      How much time, how much memory?
  • mat3001
    mat3001 almost 14 years
    You're right. I really didn't thing that Imagemagick would be the bottleneck. But I probably should have tried. Thanks also for the great examples!
  • danmactough
    danmactough over 8 years
    What a great suggestion. Just fixed a major, app-crashing bug my switching to pdftoppm thanks to this answer -- never knew about it before!
  • Milan Todorovic
    Milan Todorovic almost 8 years
    You, sir, deserve a medal for this :)
  • Ghilas BELHADJ
    Ghilas BELHADJ almost 8 years
    it is not faster than gs
  • Dmitry Akinin
    Dmitry Akinin about 7 years
    In my test MuPDF's PDF to PNG convertion is about 5-6 times faster than Ghostscript. Thank you for solution!
  • likeitlikeit
    likeitlikeit over 6 years
    This is incredibly useful. It takes seconds where Ghostscript would take minutes, plus the command line is a breeze! Thank you very much for bringing this to my attention!
  • Igor Voltaic
    Igor Voltaic over 3 years
    Thanks! Saved my day!