How to reduce PDF filesize

13,311

pdftk is the way to go IMO.

It can uncompress and compress the textual part of the PDF. Further you can use it in a script to extract all the images, compress them with some other tool and then put them back into your original document.

I'm not sure whether it can remove embedded fonts.

HTH

Share:
13,311
clarkk
Author by

clarkk

https://dynaccount.dk https://dynaccount.dk/bogfoeringsprogram/ https://dynaccount.dk/regnskabsprogram/

Updated on July 13, 2022

Comments

  • clarkk
    clarkk almost 2 years

    I got alot of PDF files and some of them are quite large..

    I got two alternatives

    1. remove images and remove embedded fonts
    2. compress images

    Is it possible to remove all objects like images/fonts in a PDF (PHP lib or command-line tool)?

    Or if I want to compress images in the PDF, which PHP library do you recommend (or command-line tool)?

    Debian/PHP

  • clarkk
    clarkk about 12 years
    these tools you are talking about.. can they be executed on a Linux server? do you have any links?
  • clarkk
    clarkk about 12 years
    Have looked at pdftk.. found a simple way to compress a pdf file pdf2ps file.pdf output file.ps then ps2pdf file.ps output new_file.pdf.. Do you have a link how to extract images and afterwards replace them with compressed ones (pdftk)?
  • sdaau
    sdaau almost 12 years
    @clarkk: QPDF can extract images, see this answer at Imagemagick: generate raw image data for PDF flate embedding? for example, not sure about replacement, I'm also looking for re-encoding only images of a PDF. Cheers!
  • William
    William almost 6 years
    can you suggest some instructions on how to do this?
  • vinc17
    vinc17 over 2 years
    The pdftk utility does not do anything about fonts, though compressing the fonts to CFF (a.k.a. Type 1C) and keeping a subset is often the best way to make the PDF file smaller, e.g. when the PDF file has been obtained with pdflatex. The ps2pdf utility can do that on fonts, but be careful, as it may corrupt the text part, and the Ghostscript developers do not care very much about that; see Ghostscript bug 704478.