Is it possible to uncompress PDF by using Adobe Acrobat or Acrobat Distiller?
Solution 1
This is easy with qpdf and pdftk.
With Adobe Acrobat you can get at the internal structure after profiling a PDF (preflight with some profile (e.g. detect PDF syntax errors), then Options->Internal PDF structure) - but there's no way to get something editable with a text editor.
Solution 2
qpdf
and pdftk
have already been mentioned. To show the commands:
$ qpdf --qdf --object-streams=disable orig.pdf uncompressed-orig.pdf
$ pdftk orig.pdf output uncompressed-orig.pdf uncompress
mutool however hasn't been mentioned yet:
$ mutool clean -d -a orig.pdf uncompressed-orig.pdf
mutool
is a command line tool which ships alongside the lightweight MuPDF PDF + document viewer.
I do not think you can achieve the uncompressing of PDF objects' streams with Acrobat or Distiller (unless you have additional payware plugins available).
Solution 3
Use cpdf:
cpdf -decompress in.pdf -o out.pdf
and then the graphic operators for each page can be read in a text editor. You'll need a copy of the standard as a reference, though.
Disclosure: I am the author of cpdf.
Related videos on Youtube
Alexey Popkov
By training, I am a chemist with a specialization in physical chemistry, finished a postgraduate course on Physical Chemistry at the Department of Chemistry of Lomonosov Moscow State University (Moscow, Russian Federation) in 2010. Active user of Wolfram Mathematica since 2006. I'm the author of Mathematica packages: PolygonMarker (aka "PolygonPlotMarkers`") – a rich set of fine-tuned plot markers for producing publication-quality plots. CurveToBSplineFunction – creates a smooth B-spline function from a list of points with flexible control over smoothing. ShortInputForm (aka shortInputForm) – displays shortened and formatted for readability InputForm of Mathematica's graphics without dropping any information. Interested in Mathematica-related work. E-mail: Uncompress["1:eJxTTMoPCpZkYGBIzEmtSK3US9IryC/Izi9zSM9NzMzRS87PBQCy2gtN"].
Updated on June 03, 2022Comments
-
Alexey Popkov almost 2 years
Most PDF files found on the Web have compressed and unreadable data streams. Is it possible to uncompress the internal content of a PDF file using Acrobat or Acrobat Distiller, allowing us to read the source code by a text editor?
P.S. This question is inspired by this answer which explains how it can be done with GhostScript.
-
mkl over 10 yearsWhat do you want to read in the editor? The operators used to draw something? Or also the text?
-
Alexey Popkov over 10 years@mkl I want to read the operators used to draw vector figures.
-
mkl over 10 yearsWhile I don't see how to do that using Acrobat (I only have version 9.5 at my hands, though), it is fairly easy to do that in a small Java or .Net program using iText or iTextSharp by reading a PDF and re-saving it without compression, cf. the method
decompressPdf
in HelloWorldCompression.java / HelloWorldCompression.cs.
-
-
Alexey Popkov over 10 yearsI need to covert a PDF into something readable with a text editor. Is it possible with Acrobat?
-
Martin Schröder over 10 years@AlexeyPopkov: You can export into e.g. XML. But editable: no.
-
Alexey Popkov over 10 yearsExporting to XML gives result similar to exporting to TXT: only textual elements are included. I need to read the operators used to draw vector figures in the PDF.
-
Alexey Popkov over 10 years+1 Thanks for
Options->Internal PDF structure
in Preflight. It would be ideal to copy its content to a text editor for further investigation. BTW, there is no need for profiling to seeInternal PDF structure
: it works from the start (at least in Acrobat 11). -
Alexey Popkov almost 9 yearsAre you sure that for
qpdf
the option--object-streams=disable
is a good choice? According to the documentation this option means "don't write any object streams." Will not the streams be erased as a result? -
Kurt Pfeifle almost 9 years@AlexeyPopkov: Yes, I'm pretty sure it is a good choice for the purpose. I'm using it daily. IF object streams are enabled, a lot of the smaller objects will be embedded into another object's stream, which makes it more complex to analyse, even if un-compressed. If you don't believe me, try it yourself. (You need an input file that has at least 1 object of
/Type /ObjStm
). Disabling object-streams will unpack all these streamed objects and put them properly into their own indirect objects again, individually. -
Alexey Popkov almost 9 yearsDo you mean that for
qpdf
seemingly obvious choice--stream-data=uncompress
will change the structure of file and complicate it? -
Kurt Pfeifle almost 9 years@AlexeyPopkov: The
--qdf
mode already implicitely implies--stream-data=uncompress
. And yes, using QPDF does change the structure of the file in some way. But it tries to do so in a content-preserving way. The self-description of QPDF even tells so, stating it being a "CLI tool that does structural, content-preserving transformations on PDF files". (In which cases the contents change in an unwanted and unexpected way is a different matter. I've filed a few bug reports/enhancement requests about these: for example OCGs ("layers") get flattend and incremental update history gets lost.) -
Alexey Popkov almost 9 yearsFrom the QPDF documentation it looks like that the
--qdf
mode creates a very-very special version of PDF file which is editable what is not supposed by developers of PDF and for this reason the--qdf
mode can expectedly corrupt the original file in some way. I appreciate this effort but I'm still unsure whether the--qdf
mode gives any benefits for readability of the PDF code (in this thread I'm not interested in editability). -
Kurt Pfeifle almost 9 years@AlexeyPopkov: It's good U read the docu before starting 2 use QPDF; I did the same, back in the days. Feel free 2 do whatever you want. I'm just sharing my knowledge + experience here. I hope you'll do the same once you learned + know more (or other) things about PDFs + related tools than I do. Whatever you finally decide for as a tool to give you the readability of PDF code: you have to compare each of it against the others first. I really hope you'll put up a writeup somewhere on the 'Net describing + weighing advantages as well as disadvantages of each tool. I'd be your first reader !!
-
Kurt Pfeifle almost 9 years@AlexeyPopkov: " I need to read the operators used to draw vector figures in the PDF". In that case look for uncompressed
/Contents
objects and their streams. Inside the expanded streams, also look for/name Do
operations -- these may point to XRef objects named/name
containing vector elements (as well as point to raster image objects). -
Stuart Poss about 5 yearsFor a given *.pdf file Acrobat Pro DC provides and Export To Function that provides a variety of alternative formats, one of which is PostScript, PostScript is the only likely option that would provide the operators. However, I haven't used PostScript, except as a stand-alone language, since shortly after it was first invented. A quick glance at the output for one page shows export provides ASCII readable Postscript output. If one can simulate/interpret the operators such as "pop", "{get exec}bdf" etc, this might be as close as you get to the code generating vector or raster graphics.