How to convert PDF to image?

368,505

Solution 1

You can use pdftoppm from the poppler-utils package to convert a PDF to a PNG:

pdftoppm input.pdf outputname -png

This will output each page in the PDF using the format outputname-01.png, with 01 being the index of the page.

Converting a single page or a range of pages of the PDF

pdftoppm input.pdf outputname -png -f {page} -singlefile

Change {page} to the page number. It's indexed at 1, so -f 1 would be the first page.

If you'd like to work on a range of pages, you can also specify a number for the flag -l (last page), so having -f 1 -l 30 would specify the pages from 1 to 30.

Note again that .png will be appended to outputname automatically, so there's no need to include the extension. Also, -singlefile removes the -01 suffix cited above, since the output is known to have only one file.

Specifying the converted image's resolution

The default resolution for this command is 150 DPI. Increasing it will result in both a larger file size and more detail.

To increase the resolution of the converted PDF, add the options -rx {resolution} and -ry {resolution}. For example:

pdftoppm input.pdf outputname -png -rx 300 -ry 300

Solution 2

  1. Install imagemagick.

  2. Using a terminal where the PDF is located:

    • For the full document:

      convert -density 150 input.pdf -quality 90 output.png
      
    • For a single page:

      convert -density 150 input.pdf[666] -quality 90 output.png
      

Whereby:

  • PNG, JPG or (virtually) any other image format can be chosen.

  • -density xxx will set the DPI to xxx (common are 150 and 300).

  • -quality xxx will set the compression to xxx for PNG, JPG and MIFF file formates (100 means no compression).

  • [666] will convert only the 667th page to PNG (zero-based numbering so [0] is the 1st page).

  • All other options (such as trimming, grayscale, etc.) can be viewed on the website of Image Magic.

Solution 3

IIRC GIMP is capable of using PDFs, i.e. converting them into images. So if you want to edit the images right away - GIMP is your friend.

Solution 4

The currently accepted answer does the job but results in an output which is larger in size and suffers from quality loss.

The method in the answer given here results in an output which is comparable in size to the input and doesn't suffer from quality loss.

TLDR - Use pdfimages : pdfimages -j input.pdf output

Quoting the linked answer:

It's not clear what you mean by "quality loss". That could mean a lot of different things. Could you post some samples to illustrate? Perhaps cut the same section out of the poor quality and good quality versions (as a PNG to avoid further quality loss).

Perhaps you need to use -density to do the conversion at a higher dpi:

convert -density 300 file.pdf page_%04d.jpg

(You can prepend -units PixelsPerInch or -units PixelsPerCentimeter if necessary. My copy defaults to ppi.)

Update: As you pointed out, gscan2pdf (the way you're using it) is just a wrapper for pdfimages (from poppler). pdfimages does not do the same thing that convert does when given a PDF as input.

convert takes the PDF, renders it at some resolution, and uses the resulting bitmap as the source image.

pdfimages looks through the PDF for embedded bitmap images and exports each one to a file. It simply ignores any text or vector drawing commands in the PDF.

As a result, if what you have is a PDF that's just a wrapper around a series of bitmaps, pdfimages will do a much better job of extracting them, because it gets you the raw data at its original size. You probably also want to use the -j option to pdfimages, because a PDF can contain raw JPEG data. By default, pdfimages converts everything to PNM format, and converting JPEG > PPM > JPEG is a lossy process.

So, try

pdfimages -j file.pdf page

You may or may not need to follow that with a convert to .jpg step (depending on what bitmap format the PDF was using).

I tried this command on a PDF that I had made myself from a sequence of JPEG images. The extracted JPEGs were byte-for-byte identical to the source images. You can't get higher quality than that.

Solution 5

If your pdfs are scanned, the images are already stored as part of pdf. you will simply need to extract them with pdfimages:

pdfimages my-file.pdf prefix 
Share:
368,505

Related videos on Youtube

Deependra Solanky
Author by

Deependra Solanky

I am from India, working on Microsoft Technologies like ASP.NET, SQL Server. I like to read new things about technology on internet on daily basis via Google Reader. I like open source and as a result have some knowledge of PHP/Ruby on Rails/Python. If Microsoft had not introduced ASP.NET MVC one year back, there were chances of me jumping into non-microsoft camp.

Updated on September 18, 2022

Comments

  • Deependra Solanky
    Deependra Solanky over 1 year

    I have requirement of converting PDF pages to images. There is a background image with some text in my file, and when I save it as an image only the background image gets saved.

    Is there any software available for the same so that complete page can be converted to an image?

    • Philippe Paré
      Philippe Paré about 7 years
      Apparently it's also possible with inkscape: stackoverflow.com/a/15484727/32453
    • user3413723
      user3413723 over 4 years
      I don't have 10 rep to post an answer so here is another way, use MuPDF. mutool convert -o file.png file.pdf
    • Anthony Ebert
      Anthony Ebert over 4 years
      On bash: pdftocairo file.pdf -png
    • Barna Kovacs
      Barna Kovacs almost 4 years
      PDFBox also does it nicely. pdfbox.apache.org
    • Eslam Sameh Ahmed
      Eslam Sameh Ahmed about 3 years
      You can use convertpdftojpg.net which is secure and fast PDF to JPG converter
    • Admin
      Admin almost 2 years
      Using GIMP is a great way to do this without using the command line.
  • mweber
    mweber over 11 years
    Thank you so much. Much better quality than with imagemagick or graphicsmagick!
  • zuo
    zuo over 10 years
    pdftoppm is much faster than convert
  • mx7
    mx7 almost 10 years
    can you explain more about what is density and what It can do?
  • Arjun
    Arjun almost 10 years
    @AgentCool It specifies the horizontal and vertical image density (in ppi).
  • aroque
    aroque over 9 years
    with only one pdf in a folder the specific name of the pdf file is not needed: pdftoppm -png *.pdf prefix
  • Elijah Lynn
    Elijah Lynn over 9 years
    The answer as is does work but the resolution is very poor. Therefore not currently an answer that is useful. Maybe if convert has some parameters that can be specified this could change.
  • OHLÁLÁ
    OHLÁLÁ about 9 years
    You can change the density by adding the -density 300 parameter
  • NoBackingDown
    NoBackingDown over 8 years
    This is really much better than imagemagick. Imagemagick actually changed the colors in an unexpected way in my case!
  • Petr R.
    Petr R. over 8 years
    The image in your answer is broken. Perhaps you should update it.
  • mlc
    mlc over 8 years
    this is good!, but it's a bit easier to write -r 300 instead of specifying the x and y resolutions independently when you want to set them to the same value.
  • Jose Gómez
    Jose Gómez about 8 years
    This is the perfect solution for scanned pdfs, as with this you can, with one command, extract the original jpgs, and without further recompressions.
  • user2364305
    user2364305 almost 8 years
    How would we put them back into pdf? with this tool, to complete the circle.
  • Abbafei
    Abbafei over 7 years
    pdftohtml (listed at end of pdftoppm manpage) worked better for my use-case; thanks for the hint :-)
  • Philippe Paré
    Philippe Paré about 7 years
    So can anybody confirm that specifying density makes it "as good" as the other answers here, or not? Also as a note to followers, ImageMagick calls out to "ghostscript" to actually convert from pdf to png ex: gs -q NOPROMPT ...-sDEVICE=pngalpha -r150x150 -sOutputFile=/var/tmp/Yf%d -f/var/tmp/L -f/var/tmp/Fic1 and if you get convert: no images defined output.png it means you don't have ghostscript installed...
  • Forty-Two
    Forty-Two over 6 years
    Is that in the free or paid version? In my version, the option is greyed out? Does that mean I need to pay? Is there a paid version?
  • turdus-merula
    turdus-merula over 6 years
    Also: pdftocairo -png page.pdf page.png
  • Pavel Vlasov
    Pavel Vlasov over 6 years
    Works fine. To obtain this software you can use brew install poppler on macos.
  • Michael Hays
    Michael Hays about 6 years
    I had much more success with pdftoppm than with imagemagick.
  • William
    William about 6 years
    Is there a way to force max settings aka no compression?
  • mghaoui
    mghaoui almost 6 years
    This worked fine for me with the -density 300 parameter.
  • frozen-flame
    frozen-flame over 5 years
    Using -density 500 -quality 100 I still get much poorer image quality compared to pdftoppm.
  • Gabriel Staples
    Gabriel Staples over 5 years
    And to convert back from images to pdf: convert output-0.png output-1.png output-2.png output.pdf. See: itsfoss.com/convert-multiple-images-pdf-ubuntu-1304
  • Joschua
    Joschua over 5 years
    I'm getting this error convert-im6.q16: not authorized 'test.pdf' @ error/constitute.c/ReadImage/412.
  • hsandt
    hsandt over 5 years
    I get convert-im6.q16: no images defined output.png' @ error/convert.c/ConvertImageCommand/3258. I know @rogerdpack mentioned it already but I have ghostscript installed, I can use gs`
  • HD189733b
    HD189733b over 5 years
    I made the pdf plot with python matplotlib or ROOT. When I use pdftoppm or convert module to convert the plot into png, the result is placed at the top-right corner and it leaves a wide white space. I solved the problem by adding -cropbox option.
  • Jezor
    Jezor over 5 years
    Parsing PDF in imagemagick has been disabled - bugs.archlinux.org/task/59778 - it can be enabled manually by editing /etc/ImageMagick-7/policy.xml file and removing PDF from <policy domain="coder" rights="none" pattern="{PS,PS2,PS3,EPS,PDF,XPS}" />
  • Martin Thoma
    Martin Thoma over 5 years
    You might want to add -background white -alpha off to remove transparency.
  • typeduke
    typeduke almost 5 years
    To make it a CBZ (e.g. for reading in an ebook reader like Gnome Books) you can chain commands and use pdftoppm myfile.pdf myfile -png && zip myfile.cbz myfile-*.png; rm myfile-*.png. This will give a "myfile.cbz" in the same directory as "myfile.pdf" 🙂
  • typeduke
    typeduke almost 5 years
    Or, to make it easier to do multiple PDFs, use FILE=filename-without-extension; pdftoppm $FILE.pdf $FILE -png && zip $FILE.cbz $FILE-*.png; rm $FILE-*.png. This will give a "filename-without-extension.cbz" in the same directory as "filename-without-extension.pdf".
  • Dan Dascalescu
    Dan Dascalescu over 4 years
    GIMP can indeed open PDFs, each page as one layer. Choosing "Export As" seems to save only the current layer, but you can easily delete the layer after exporting and run "Export As" again.
  • Gabriel Staples
    Gabriel Staples over 4 years
    pdftoppm works extremely well and supports a bunch of output image formats, including PPM, PNG, JPEG, TIFF. You can also specify the resolution with -r 300 for example, as well as the JPEG compression (quality) level. See my full answer with examples here: askubuntu.com/questions/150100/…
  • durette
    durette over 4 years
    I found GIMP produces a much higher quality conversion than imagemagick (as of the current respective versions packaged in Ubuntu 19.04)
  • durette
    durette over 4 years
    As of the current respective versions packaged in Ubuntu 19.04, I found GIMP produces a much higher quality conversion than imagemagick.
  • Deependra Solanky
    Deependra Solanky about 4 years
    @ElijahLynn I have changed the accepted answer.
  • Zoltán
    Zoltán about 4 years
    I first skipped this answer, because I didn't want to install extra software - only to find out I already had pdftoppm installed on Ubuntu 18.04
  • GuyPaddock
    GuyPaddock almost 4 years
    This is the incorrect solution for the OPs question if the PDF is a print-ready PDF created by something like Illustrator or Acrobat, since pdfimages extracts only the images from the PDF but does not flatten each entire page and export the full pages to images.
  • GuyPaddock
    GuyPaddock almost 4 years
    This is the incorrect solution for the OPs question if the PDF is a print-ready PDF created by something like Illustrator or Acrobat, since pdfimages extracts only the images from the PDF but does not flatten each entire page and export the full pages to images.
  • Anmol Singh Jaggi
    Anmol Singh Jaggi almost 4 years
    @GuyPaddock Thanks for pointing it out.
  • Roah
    Roah over 3 years
    Is there any way to set transparent background in png? The background is white with pdftoppm and transparent with convert, but convert has problems with big pdfs even if I increase memory limit in policy.xml.
  • Manohar
    Manohar over 3 years
    is there any way to add password ?
  • somethis
    somethis about 3 years
    Unfortunately, I couldn't make out a pragmatic, easy to follow routine with my favorite tool "convert". I'll have to agree with @ElijahLynn and point to solution askubuntu.com/a/50180/11929
  • justanoob
    justanoob about 3 years
    @turdus-merula Seemingly cairo is buggier than ppm.
  • Huseyin
    Huseyin almost 3 years
    Easy and effective metheod.
  • cipricus
    cipricus over 2 years
    crashes with relatively large documents
  • cipricus
    cipricus over 2 years
    (In case it crashes at some point with pdf with many pages: print part of the original to pdf before extracting from the output with this tool)
  • Denilson Sá Maia
    Denilson Sá Maia over 2 years
    -cropbox exported the pages as I expected, so try using this option if you don't like your initial results.
  • Avatar
    Avatar about 2 years
    Provided by: poppler-utils_0.24.5-2ubuntu4_amd64. Docs: manpages.ubuntu.com/manpages/trusty/man1/pdftocairo.1.html
  • Avatar
    Avatar about 2 years
    Sidenote: To install the software on Ubuntu: sudo apt update then sudo apt install poppler-utils
  • Avatar
    Avatar about 2 years
    If you want to resize the resulting PNG use e.g. -scale-to 300. This will give a PNG with max height of 300px. Parameter -r is "kind of how blocky it will look, and -scale-to is how big the overall image will be (on one side)." askubuntu.com/a/1179820/238253
  • Avatar
    Avatar about 2 years
    See pdftoppm docs/manual: systutorials.com/docs/linux/man/1-pdftoppm
  • SurpriseDog
    SurpriseDog about 2 years
    Thanks! That preserved the fonts, unlike inkscape. Afterwards I used convert -trim to get rid of whitespace because -cropbox didn't work for me.