How can I extract images from a PDF file?

php perl pdf

19,281

Solution 1

pdfimages does just that. It's is part of the poppler-utils and xpdf-utils packages.

From the manpage:

Pdfimages saves images from a Portable Document Format (PDF) file as Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files.

Pdfimages reads the PDF file, scans one or more pages, PDF-file, and writes one PPM, PBM, or JPEG file for each image, image-root-nnn.xxx, where nnn is the image number and xxx is the image type (.ppm, .pbm, .jpg).

NB: pdfimages extracts the raw image data from the PDF file, without performing any additional transforms. Any rotation, clipping, color inversion, etc. done by the PDF content stream is ignored.

Solution 2

With regards to Perl, have you checked CPAN?

PDF::GetImages - get images from pdf document
PDF::OCR - get ocr and images out of a pdf file
PDF::OCR2 - extract all text and all image ocr from pdf

19,281

Author by

Anil

Updated on July 03, 2022

Comments

Anil almost 2 years

I need to extract all the images from a PDF file on my server. I don't want the PDF pages, only the images at their original size and resolution.

How could I do this with Perl, PHP or any other UNIX based app (which I would invoke with the exec function from PHP)?
PolyThinker over 15 years

I think the package gets installed when you install xpdf.
Luis Melgratti over 15 years

that is correct too, both packages have pdfimages.