How can I deflate compressed streams inside a PDF?

5,032

I found a way to do this using freely available windows programs. Won't require scripting or compiling or cygwin. It's a few steps but not as bad as it looks.

  1. Install mupdf - http://mupdf.googlecode.com/files/mupdf-0.8.15-windows.zip and copy your pdf to mupdf's installation folder. Let's say it's called whatever.pdf.

  2. Open a dos/command prompt. Navigate to your mupdf install folder. example: cd C:\Program Files\mupdf ...If that goes smoothly, your prompt should now look like this: C:\Program Files\mupdf> Now type the following command: pdfextract whatever.pdf

Afterwards, within the mupdf program folder, you'll have one or more font files. They'll have names like ABCDEF+Fontname-12.cff ...Right now they're in the unusable .cff format but we'll fix that. I recommend renaming this to something less awkward... for example whatever.cff

  1. More DOS, sorry. You need a tool called cfftot1.exe. Here's a link: ftp://tug.org/texlive/Contents/live/bin/win32/cfftot1.exe ...Copy it to your mupdf folder. Then type this: cfftot1 whatever.cff whatever.pfb

  2. You now have an almost usable font file called whatever.pfb. I say 'almost' because usually PFB font files also come with a 2nd file, a PFM file which contains spacing information. Without this file the font won't install and the spacing will be screwed up. But the font will still open in font editors like fontlab. You can save the font from there to TTF or OTF. You can also try fixing the spacing yourself.

If you don't have a font editor, you can use crossfont. Crossfont can take the PFB and generate the necessary PFM file so you can at least install and use the font. link - http://crossfont.en.softonic.com/

That's it.

Share:
5,032

Related videos on Youtube

CreeDorofl
Author by

CreeDorofl

Updated on September 18, 2022

Comments

  • CreeDorofl
    CreeDorofl over 1 year

    I read a stackexchange post on how to find a stream of font data within a PDF. This stream, if decompressed properly, apparently creates a usable TTF file. But I'm not sure how to decompress it.

    https://stackoverflow.com/questions/1922625/extract-embedded-pdf-fonts-to-an-external-ttf-file-using-some-utility-or-script

    From the post it sounds like I can copy the data into a new file, then run a script to decompress it. But I'm not sure how I'd run this script.

    Is there a simple installable windows program that can accept this stream and decompress it? Can I really just open the PDF as plain text, copy and paste what I think are the stream characters to a new file, and save it?

    • Mike Fitzpatrick
      Mike Fitzpatrick about 13 years
      The "script" you refer to in the other question is C code (not a complete program). You'd need a C compiler and the correct libraries and some programming knowledge to put it all together. While the described method might work, a potentially big caveat you should be aware of is that PDFs don't always contain an entire font. It is quite common for them to only include the subset of characters required that document only. So even if you get a working solution, you might end up with an incomplete font. You can generally find out by checking the document's font properties withing your PDF viewer.
    • CreeDorofl
      CreeDorofl about 13 years
      @Mike Fitzpatrick Gotcha. I had a feeling it wouldn't be that easy. But, if all the script is doing is basically decompressing something, is there no program that handles it, something specializing in obscure archives rather than everyday zips and rars?