pdf to jpg without quality loss; gscan2pdf
Solution 1
It's not clear what you mean by "quality loss". That could mean a lot of different things. Could you post some samples to illustrate? Perhaps cut the same section out of the poor quality and good quality versions (as a PNG to avoid further quality loss).
Perhaps you need to use -density
to do the conversion at a higher dpi:
convert -density 300 file.pdf page_%04d.jpg
(You can prepend -units PixelsPerInch
or -units PixelsPerCentimeter
if necessary. My copy defaults to ppi.)
Update: As you pointed out, gscan2pdf
(the way you're using it) is just a wrapper for pdfimages
(from poppler). pdfimages
does not do the same thing that convert
does when given a PDF as input.
convert
takes the PDF, renders it at some resolution, and uses the resulting bitmap as the source image.
pdfimages
looks through the PDF for embedded bitmap images and exports each one to a file. It simply ignores any text or vector drawing commands in the PDF.
As a result, if what you have is a PDF that's just a wrapper around a series of bitmaps, pdfimages
will do a much better job of extracting them, because it gets you the raw data at its original size. You probably also want to use the -j
option to pdfimages
, because a PDF can contain raw JPEG data. By default, pdfimages
converts everything to PNM format, and converting JPEG > PPM > JPEG is a lossy process.
So, try
pdfimages -j file.pdf page
You may or may not need to follow that with a convert
to .jpg
step (depending on what bitmap format the PDF was using).
I tried this command on a PDF that I had made myself from a sequence of JPEG images. The extracted JPEGs were byte-for-byte identical to the source images. You can't get higher quality than that.
Solution 2
convert
doesn't work for me. This (pdftoppm
) works perfectly, however. Each of the below commands will ensure an "images" directory exists, creating it if it doesn't, and store the generated images into that directory.
1200 DPI
mkdir -p images && pdftoppm -jpeg -r 1200 mypdf.pdf images/pg
600 DPI
mkdir -p images && pdftoppm -jpeg -r 600 mypdf.pdf images/pg
300 DPI (produces ~1MB-sized files per pg)
mkdir -p images && pdftoppm -jpeg -r 300 mypdf.pdf images/pg
300 DPI with least compression/highest quality (produces ~2MB-sized files per pg)
mkdir -p images && pdftoppm -jpeg -jpegopt quality=100 -r 300 mypdf.pdf images/pg
Additional reading:
- https://stackoverflow.com/questions/43085889/how-to-convert-a-pdf-into-jpg-with-commandline-in-linux/61700520#61700520
- https://stackoverflow.com/questions/6605006/convert-pdf-to-image-with-high-resolution/58795684#58795684
- https://askubuntu.com/questions/150100/extracting-embedded-images-from-a-pdf/1187844#1187844
Solution 3
As student's answer said pdfimages
is a good option. From my experience both gs
and convert
export to poor quality regardless if you specify the right dpi.
But if the pdf has multiple layers per page pdfimages
doesn't work and extracts the layers as separate image, in that case best is to use inskcape
to export the page as is seen.
This are the commands I use:
pdftk combined_to_do.pdf burst output pg_%04d.pdf
ls ./pg*.pdf | xargs -L1 -I {} inkscape {} -z --export-dpi=300 --export-area-drawing --export-png={}.png
First command splits all pages second command converts page by page to png. You can keep them png or just convert them to jpeg
ls ./p*.png | xargs -L1 -I {} convert {} -quality 100 -density 300 {}.jpg
Compared to pdfimages
, gs
, and ImageMagick's convert
I find inkscape
's export the best in quality.
Solution 4
the response from @cjm is correct, but if you like GUI and don't want to render all pdf pages, just to get some image, use gimp.
Open a pdf with gimp an you will get a import window with all pages rendered. Choose whatever pages you want and set resolution to 600 pix/inch (I found 300 too much sharpen in many cases). Save to format you want with "File/export"
Anyway, there must be a flag to select desired pages from command line.
Solution 5
What is not clear in your question is whether you talk about text and vector graphics in your pdf, or whether your pdf contains embedded images.
Having read what gscan2pdf is about, my guess is that your pdf files contain (only) embedded graphics.
convert
essentially "prints" your pdf without regards for what the contents is. Like @cjm suggests, you might want to change the print density. This is the only way to increase quality for vector graphics.
If instead, what you want to do is extract embedded images (much like gscan2pdf seems to do), guessing the density will usually lead to either quality loss or higher quality than required (and waste of disk space). The answer then is to extract the image rather than print the pdf. See this article which basically advocates the use of pdfimages
in order to extract images without quality loss.
Related videos on Youtube
student
Updated on September 18, 2022Comments
-
student over 1 year
When I convert a pdf file to bunch of jpg files using
convert -quality 100 file.pdf page_%04d.jpg
I have appreciable quality loss.
However if I do the following, there is no (noticeable) quality loss:
Start gscan2pdf, choose file-> import (and choose file.pdf). Then go to the temporary directory of gscan2pdf. There are many pnm files (one for every page of the pdf-file). Now I do
for file in *.pnm; do convert $file $file.jpg done
The resulting jpg-files are (roughly) of the same quality as the original pdf (which is what I want).
Now my question is, if there is a simple command line way to convert the pdf file to a bunch of jpg files without noticeable quality loss? (The solution above is too complicated and time consuming).
-
asoundmove about 13 yearsWhat is not clear in your questions is whether you talk about text and vector graphics in your pdf, or whether you mean to extract embedded images.
-
baponkar over 3 yearssudo snap install pdftk;pdftk file.pdf burst output pg_%04d.pdf ; for i in *; do convert -density 300 $i ${i}.jpg; done
-
-
Matthew about 12 years+1 I am so glad I didn't submit to the snobbery misreading one of your sentences inspired in me and actually tried pdfimages -- probably the most useful program I have used in months! I'd encourage everyone to try it!
-
cjm about 12 years@ixtmixilix, I'm curious. What did you misread, and how?
-
Camille Goudeseune over 8 years
convert
is also impractical for large PDFs. For example, it took 45 GB of memory to process a book of 700 6-megapixel pages. It also took about a thousand times longer thanpdfimages
. -
erik about 8 yearsFor the other way round, convert images to a pdf, or better, wrap images into a pdf, use img2pdf, here: gitlab.mister-muffin.de/josch/img2pdf (wraps jpg and jpg2000 into a pdf).
-
Abbafei over 7 yearsby the way, if you don't need jpgs specifically, but want the actual image data from the PDF regardless of format, use
-all
in place of-j
:-) -
Hanaa Gebril over 6 yearsI got strange checkered boxes all over my converted image files using the above 'convert' command, until I converted from pdf to pdf first (strange, I know). After that, the above command worked and there were no checkered boxes in the images. I wonder if it's some for of script runningin the original pdf that created the checkers? The original pdf was editable, I wonder if that was why.
-
Eduard Florinescu over 6 yearspdfimages really does the job