Why is a PDF containing just one image much bigger than the image itself?

5,295

If you have an image in JPEG format, you can go the following short way. Use the jpeg2ps wrapper to convert to PostScript. Then use ps2pdf to convert into PDF.

Share:
5,295

Related videos on Youtube

Gaël Barbin
Author by

Gaël Barbin

https://www.linkedin.com/in/ga%C3%ABl-barbin/

Updated on September 18, 2022

Comments

  • Gaël Barbin
    Gaël Barbin almost 2 years

    I would like to embed a scanned document into a PDF document.

    The source picture is about 300 kB.
    If I use the convert command, the PDF has a size of 30 MB, and with GIMP, 3 MB.

    Here the resulting file sizes of various commands. The only way I found to get a reasonable PDF file size is to first convert to JPEG, then to PDF.

    scanimage -p --mode Color --format tiff -x 205 -y 297 > image.tiff      | 25.5
    convert -quality 30  -compress Zip image.tiff image-zip.pdf             | 32.2
    convert -quality 30   image.tiff image.pdf                              | 12.1
    convert -compress Zip image.tiff image-wq-zip.pdf                       | 11.1
    
    convert image.tiff image.jpg                                            | 2.3
    convert -quality 30 image.tiff image.jpg                                | 0.34
    convert -quality 30 -define jpeg:extent=200kb image.tiff image-200.jpg  | 0.19
    
    convert image-200.jpg image-jpg.pdf                                     | 0.19
    
    • Hastur
      Hastur over 8 years
      Because you probably "raster-ed" the image with a different resolution or it was changed the compression level. Can you give more information about the pdf and the command used? in which format was the image? You can have some hints with identify -verbose yourfile.pdf and identify -verbose yourfile.jpg (assuming jpg as source format). It can be changed even the colorspace.
    • James P
      James P over 8 years
      Try using the compress option in the command, e.g. -compress Zip. More information is here: imagemagick.org/Usage/formats/#pdf_options
    • Hastur
      Hastur over 8 years
      It is mainly the compression algorithm. Try to extract the images from the one coverted by gimp, (g.pdf), the one with convert s.tif c.pdf and the last with convert s.tif -compress Zip z.pdf: you can use pdfimages g.pdf g, pdfimages c.pdf c and pdfimages z.pdf z. You will find g-000.ppm, c-000.ppm and z-000.ppm that are almost the same. You can compare (subtract) to stress the differences...
    • Gaël Barbin
      Gaël Barbin over 8 years
      with -compress Zip, the file size is 12Mo.
    • Hastur
      Hastur over 8 years
      And with -compress jpeg ? I've tried with an uncompressed tif and I found the same quality and size for the gimp and -compress zip ones. If you should post some link to the image (or to another one that generate the same behaviour) we can do some attempt...
  • Yorik
    Yorik over 8 years
    "minimum size" would work, but its not compatibility, it is because it downsamples to 72ppi and uses JPG compression when embedding the images.
  • Sanny
    Sanny over 8 years
    @Yorik Indeed, it's about compression in this case. But generally speaking, if you use Adobe Acrobat Pro: File -> Save as -> Reduced size PDF, you'll be prompt to select Acrobat Version Compatibility which are presets for image compressions, fonts embedding, etc. which can be set in the optimised PDF command from File menu.
  • p._phidot_
    p._phidot_ almost 3 years
    A (bitmap-wise) low-level solution. Very neat (and smart)!
  • Claude Frantz
    Claude Frantz almost 3 years
    Remember that pdf is mainly an especially structured form of PostScript. Therefore the capabilities to insert an rasterized image are depending on the capabilities of PostScript. This language includes some compression capabilities for such pictures. If you use these capabilities, the size of the final document can be reduced, but nobody is forced to use them. Please note that the use of reusable streams can give you an benefit, mainly in the processing time sense. Not all these features are available in all PostScript levels.