Is it possible to uncompress PDF by using Adobe Acrobat or Acrobat Distiller?

19,232

Solution 1

This is easy with qpdf and pdftk.

With Adobe Acrobat you can get at the internal structure after profiling a PDF (preflight with some profile (e.g. detect PDF syntax errors), then Options->Internal PDF structure) - but there's no way to get something editable with a text editor.

Solution 2

qpdf and pdftk have already been mentioned. To show the commands:

$ qpdf --qdf --object-streams=disable orig.pdf uncompressed-orig.pdf
$ pdftk orig.pdf output uncompressed-orig.pdf uncompress

mutool however hasn't been mentioned yet:

$ mutool clean -d -a orig.pdf uncompressed-orig.pdf

mutool is a command line tool which ships alongside the lightweight MuPDF PDF + document viewer.

I do not think you can achieve the uncompressing of PDF objects' streams with Acrobat or Distiller (unless you have additional payware plugins available).

Solution 3

Use cpdf:

cpdf -decompress in.pdf -o out.pdf

and then the graphic operators for each page can be read in a text editor. You'll need a copy of the standard as a reference, though.

Disclosure: I am the author of cpdf.

Share:
19,232

Related videos on Youtube

Alexey Popkov
Author by

Alexey Popkov

By training, I am a chemist with a specialization in physical chemistry, finished a postgraduate course on Physical Chemistry at the Department of Chemistry of Lomonosov Moscow State University (Moscow, Russian Federation) in 2010. Active user of Wolfram Mathematica since 2006. I'm the author of Mathematica packages: PolygonMarker (aka "PolygonPlotMarkers`") – a rich set of fine-tuned plot markers for producing publication-quality plots. CurveToBSplineFunction – creates a smooth B-spline function from a list of points with flexible control over smoothing. ShortInputForm (aka shortInputForm) – displays shortened and formatted for readability InputForm of Mathematica's graphics without dropping any information. Interested in Mathematica-related work. E-mail: Uncompress["1:eJxTTMoPCpZkYGBIzEmtSK3US9IryC/Izi9zSM9NzMzRS87PBQCy2gtN"].

Updated on June 03, 2022

Comments

  • Alexey Popkov
    Alexey Popkov almost 2 years

    Most PDF files found on the Web have compressed and unreadable data streams. Is it possible to uncompress the internal content of a PDF file using Acrobat or Acrobat Distiller, allowing us to read the source code by a text editor?

    P.S. This question is inspired by this answer which explains how it can be done with GhostScript.

    • mkl
      mkl over 10 years
      What do you want to read in the editor? The operators used to draw something? Or also the text?
    • Alexey Popkov
      Alexey Popkov over 10 years
      @mkl I want to read the operators used to draw vector figures.
    • mkl
      mkl over 10 years
      While I don't see how to do that using Acrobat (I only have version 9.5 at my hands, though), it is fairly easy to do that in a small Java or .Net program using iText or iTextSharp by reading a PDF and re-saving it without compression, cf. the method decompressPdf in HelloWorldCompression.java / HelloWorldCompression.cs.
  • Alexey Popkov
    Alexey Popkov over 10 years
    I need to covert a PDF into something readable with a text editor. Is it possible with Acrobat?
  • Martin Schröder
    Martin Schröder over 10 years
    @AlexeyPopkov: You can export into e.g. XML. But editable: no.
  • Alexey Popkov
    Alexey Popkov over 10 years
    Exporting to XML gives result similar to exporting to TXT: only textual elements are included. I need to read the operators used to draw vector figures in the PDF.
  • Alexey Popkov
    Alexey Popkov over 10 years
    +1 Thanks for Options->Internal PDF structure in Preflight. It would be ideal to copy its content to a text editor for further investigation. BTW, there is no need for profiling to see Internal PDF structure: it works from the start (at least in Acrobat 11).
  • Alexey Popkov
    Alexey Popkov almost 9 years
    Are you sure that for qpdf the option --object-streams=disable is a good choice? According to the documentation this option means "don't write any object streams." Will not the streams be erased as a result?
  • Kurt Pfeifle
    Kurt Pfeifle almost 9 years
    @AlexeyPopkov: Yes, I'm pretty sure it is a good choice for the purpose. I'm using it daily. IF object streams are enabled, a lot of the smaller objects will be embedded into another object's stream, which makes it more complex to analyse, even if un-compressed. If you don't believe me, try it yourself. (You need an input file that has at least 1 object of /Type /ObjStm). Disabling object-streams will unpack all these streamed objects and put them properly into their own indirect objects again, individually.
  • Alexey Popkov
    Alexey Popkov almost 9 years
    Do you mean that for qpdf seemingly obvious choice --stream-data=uncompress will change the structure of file and complicate it?
  • Kurt Pfeifle
    Kurt Pfeifle almost 9 years
    @AlexeyPopkov: The --qdf mode already implicitely implies --stream-data=uncompress. And yes, using QPDF does change the structure of the file in some way. But it tries to do so in a content-preserving way. The self-description of QPDF even tells so, stating it being a "CLI tool that does structural, content-preserving transformations on PDF files". (In which cases the contents change in an unwanted and unexpected way is a different matter. I've filed a few bug reports/enhancement requests about these: for example OCGs ("layers") get flattend and incremental update history gets lost.)
  • Alexey Popkov
    Alexey Popkov almost 9 years
    From the QPDF documentation it looks like that the --qdf mode creates a very-very special version of PDF file which is editable what is not supposed by developers of PDF and for this reason the --qdf mode can expectedly corrupt the original file in some way. I appreciate this effort but I'm still unsure whether the --qdf mode gives any benefits for readability of the PDF code (in this thread I'm not interested in editability).
  • Kurt Pfeifle
    Kurt Pfeifle almost 9 years
    @AlexeyPopkov: It's good U read the docu before starting 2 use QPDF; I did the same, back in the days. Feel free 2 do whatever you want. I'm just sharing my knowledge + experience here. I hope you'll do the same once you learned + know more (or other) things about PDFs + related tools than I do. Whatever you finally decide for as a tool to give you the readability of PDF code: you have to compare each of it against the others first. I really hope you'll put up a writeup somewhere on the 'Net describing + weighing advantages as well as disadvantages of each tool. I'd be your first reader !!
  • Kurt Pfeifle
    Kurt Pfeifle almost 9 years
    @AlexeyPopkov: " I need to read the operators used to draw vector figures in the PDF". In that case look for uncompressed /Contents objects and their streams. Inside the expanded streams, also look for /name Do operations -- these may point to XRef objects named /name containing vector elements (as well as point to raster image objects).
  • Stuart Poss
    Stuart Poss about 5 years
    For a given *.pdf file Acrobat Pro DC provides and Export To Function that provides a variety of alternative formats, one of which is PostScript, PostScript is the only likely option that would provide the operators. However, I haven't used PostScript, except as a stand-alone language, since shortly after it was first invented. A quick glance at the output for one page shows export provides ASCII readable Postscript output. If one can simulate/interpret the operators such as "pop", "{get exec}bdf" etc, this might be as close as you get to the code generating vector or raster graphics.