What is the smallest possible valid PDF?

45,797

Solution 1

This is an interesting problem. Taking it by the book, you can start off with this:

%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
xref
0 4
0000000000 65535 f
0000000010 00000 n
0000000053 00000 n
0000000102 00000 n
trailer<</Size 4/Root 1 0 R>>
startxref
149
%EOF

which is 291 bytes of PDF joy. Acrobat opens it, but it complains somewhat. There is one page in it and it is 3/72" square, the minimum allowed by the spec.

However, Acrobat X doesn't even bother with the cross reference table anymore, so we can take that out:

%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
trailer<</Size 4/Root 1 0 R>>

Acrobat complains, but opens it. Now we're at 178 bytes. Turns out that you don't need that /Size in the trailer. Now we're at 172:

%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
trailer<</Root 1 0 R>>

Turns out you don't need all those pesky /Type elements in your dictionaries:

%PDF-1.0
1 0 obj<</Pages 2 0 R>>endobj 2 0 obj<</Kids[3 0 R]/Count 1>>endobj 3 0 obj<</MediaBox[0 0 3 3]>>endobj
trailer<</Root 1 0 R>>

Now we're at 138 bytes.

It also turns out that when the spec says "shall be an indirect reference" and /Count is required, and the header "must" be %PDF-1.0, they're making loose suggestions. This is the smallest I could make it and have it openable in Acrobat X:

%PDF-1.
trailer<</Root<</Pages<</Kids[<</MediaBox[0 0 3 3]>>]>>>>>>

70 bytes.

Now, my editor uses Windows newline discipline, but Acrobat accepts Windows, Mac, or Unix conventions, so by using a hex editor, I replaced the \r\n with \r and removed the last newline altogether, which leaves me with 67 bytes

25 50 44 46 2D 31 2E 0D 74 72 61 69 6C 65 72 3C 
3C 2F 52 6F 6F 74 3C 3C 2F 50 61 67 65 73 3C 3C 
2F 4B 69 64 73 5B 3C 3C 2F 4D 65 64 69 61 42 6F 
78 5B 30 20 30 20 33 20 33 5D 3E 3E 5D 3E 3E 3E 
3E 3E 3E 

I tried taking off the last end dictionary (>>), but Acrobat wouldn't have that. The PDF reading built-in to Google Chrome (FoxIt) won't open it.

As a PostScript (HA! See what I did there?), if you consent to Acrobat "repairing" the file, it bumps up to 3550 bytes, most of it optional metadata, but it leaves behind a number of clear spec violations.

Solution 2

I could not get the hello world example to open.

For a small-ish file with text content :

%PDF-1.2 
9 0 obj
<<
>>
stream
BT/ 9 Tf(Test)' ET
endstream
endobj
4 0 obj
<<
/Type /Page
/Parent 5 0 R
/Contents 9 0 R
>>
endobj
5 0 obj
<<
/Kids [4 0 R ]
/Count 1
/Type /Pages
/MediaBox [ 0 0 99 9 ]
>>
endobj
3 0 obj
<<
/Pages 5 0 R
/Type /Catalog
>>
endobj
trailer
<<
/Root 3 0 R
>>
%%EOF

Solution 3

Based on all the answers here, here's the smallest PDF with text:

SMALL_PDF = (
    b"%PDF-1.2 \n"
    b"9 0 obj\n<<\n>>\nstream\nBT/ 32 Tf(  YOUR TEXT HERE   )' ET\nendstream\nendobj\n"
    b"4 0 obj\n<<\n/Type /Page\n/Parent 5 0 R\n/Contents 9 0 R\n>>\nendobj\n"
    b"5 0 obj\n<<\n/Kids [4 0 R ]\n/Count 1\n/Type /Pages\n/MediaBox [ 0 0 250 50 ]\n>>\nendobj\n"
    b"3 0 obj\n<<\n/Pages 5 0 R\n/Type /Catalog\n>>\nendobj\n"
    b"trailer\n<<\n/Root 3 0 R\n>>\n"
    b"%%EOF"
)

As base64. Copy this and test in Chrome:

data:application/pdf;base64,JVBERi0xLjIgCjkgMCBvYmoKPDwKPj4Kc3RyZWFtCkJULyAzMiBUZiggIFlPVVIgVEVYVCBIRVJFICAgKScgRVQKZW5kc3RyZWFtCmVuZG9iago0IDAgb2JqCjw8Ci9UeXBlIC9QYWdlCi9QYXJlbnQgNSAwIFIKL0NvbnRlbnRzIDkgMCBSCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9LaWRzIFs0IDAgUiBdCi9Db3VudCAxCi9UeXBlIC9QYWdlcwovTWVkaWFCb3ggWyAwIDAgMjUwIDUwIF0KPj4KZW5kb2JqCjMgMCBvYmoKPDwKL1BhZ2VzIDUgMCBSCi9UeXBlIC9DYXRhbG9nCj4+CmVuZG9iagp0cmFpbGVyCjw8Ci9Sb290IDMgMCBSCj4+CiUlRU9G

To make the page bigger, adjust the MediaBox dimensions :)

/MediaBox [ 0 0 250 50 ]

Solution 4

I thought I'd make a smallest pdf that displays "Hello World". The text is in the lower left corner. Sorry about the 9-point font, any larger would cost an extra byte :)

172 bytes for Adobe Reader X (if saved with linefeed-only newlines and no trailing newline or null-byte):

%PDF-1.
1 0 obj<</Kids[<</Parent 1 0 R/Resources<<>>/Contents 2 0 R>>]>>endobj 2 0 obj<<>>stream
BT/ 9 Tf(Hello World)' ET
endstream
endobj trailer<</Root<</Pages 1 0 R>>>>

120 bytes for Chrome's builtin PDF viewer:

%PDF 1 0 obj<</Pages<</Kids[<</Contents<<>>stream
BT 9 Tf(Hello World)' ET endstream>>]>>>>endobj trailer<</Root 1 0 R>>

To easily see this in Chrome, paste this URI in the address bar (SO won't let me link to it, and it won't work at all in other browsers):

data:application/pdf,%25PDF%201%200%20obj%3C%3C%2FPages%3C%3C%2FKids%5B%3C%3C%2FContents%3C%3C%3E%3Estream%0ABT%209%20Tf(Hello%20World)'%20ET%20endstream%3E%3E%5D%3E%3E%3E%3Eendobj%20trailer%3C%3C%2FRoot%201%200%20R%3E%3E

Solution 5

According to this Ange Albertini lecture, the smallest possible valid PDF is 36 bytes:

%PDF-(NULL)trailer<</Root<</Pages<<>>>>>>

Where (NULL) is the unprintable ASCII 0 character.

However, as Ange notes, while this PDF is technically valid, most PDF reader apps will regard it as invalid based on the size alone, thus failing to open it.

Share:
45,797

Related videos on Youtube

meshy
Author by

meshy

I hope you find some of my tools useful: Django Schema Graph: Diagram your Django apps and models; automatically, interactively, colourfully. Classy Class Based Views: To read and understand Django's class based views. Python Wheels: To quantify adoption of Python's Wheel format. Not to be confused with the official documentation. FramewIRC: An asynchronous IRC framework for Python 3. Colour Runner: To output unittest tests in colour.

Updated on January 18, 2022

Comments

  • meshy
    meshy over 2 years

    Out of simple curiosity, having seen the smallest GIF, what is the smallest possible valid PDF file?

    • devnull
      devnull about 11 years
      Depends on how you create it. Chances are that you'll be able to write a smaller one yourself (in an editor) than what an application would generate.
    • devnull
      devnull about 11 years
      Try feeding "showpage" (w/o quotes) to ghostscript or ps2pdf.
  • mkl
    mkl about 11 years
    It also turns out that when the spec says "shall be an indirect reference" and /Count is required, and the header "must" be %PDF-1.0, they're making loose suggestions. No, those aren't loose suggestions, those are requirements for validity. Even if some PDF viewers don't enforce them, not following them implies invalidity, and the OP asked for a valid PDF.
  • meshy
    meshy about 11 years
    Accepted because answer starts off with the minimum allowed by the spec and then goes above and beyond. Great answer, thank-you! :)
  • neonzeon
    neonzeon almost 11 years
    plith, that's an awesome answer. Now, how about the smallest valid pdf with a line of text in it, like "Hello World". I thought it would be as simple as adding { stream BT ("Hello World") ET endstream } but so far could not make Acrobat happy.
  • Michaël
    Michaël over 10 years
    In fact, taking it "by the book" would require the Page to have a /Parent — this is what Adobe complains about, I believe. See the "7.7.3.3 Page Objects" section in the standard.
  • Tony Edgecombe
    Tony Edgecombe over 10 years
    Page having to have a parent is a bit of a pain because it sets up a circular reference.
  • plinth
    plinth over 10 years
    That's the spec. The graph of objects in PDF has cycles.
  • mkl
    mkl about 10 years
    Pretty small. ;) Not valid, though, according to the spec.
  • towi
    towi about 9 years
    I needed a base64-representation of a PDF. So, if anyone is interested, here is the base64-string of the 138 bytes version: JVBERi0xLjAKMSAwIG9iajw8L1BhZ2VzIDIgMCBSPj5lbmRvYmogMiAwIG9i‌​ajw8L0tpZHNbMyAw\nIF‌​JdL0NvdW50IDE+PmVuZG‌​9iaiAzIDAgb2JqPDwvTW‌​VkaWFCb3hbMCAwIDMgM1‌​0+PmVuZG9iagp0\ncmFp‌​bGVyPDwvUm9vdCAxIDAg‌​Uj4+Cg==
  • yms
    yms almost 9 years
    This will not work, you need to define a font resource and select it inside the page content for the text to show up.
  • MCattle
    MCattle over 8 years
    ...and here's the base64-string version of the 67 byte version: JVBERi0xLg10cmFpbGVyPDwvUm9vdDw8L1BhZ2VzPDwvS2lkc1s8PC9NZWRp‌​YUJveFswIDAgMyAzXT4+‌​XT4+Pj4+Pg==
  • Christopher Schultz
    Christopher Schultz over 7 years
    @towi Your base64-encoded version has \ns embedded in it, and when base64-decoded doesn't give the correct file contents.
  • towi
    towi over 7 years
    @ChristopherSchultz Good to know for people who are not using an \n-agnostic decoder. I seem to be doing that, since \n is not one of the base64 chars.
  • Devy
    Devy over 7 years
    this file actually opens under Mac OS X El Capitan whereas the most rated answer with PDF1.0 did not.
  • AJMansfield
    AJMansfield over 7 years
    I played around with this a bit, and if you instead use 1 0 obj<</Pages<</Kids[<</MediaBox[0 0 3 3]>>]>>>>endobj trailer<</Root 1 0 R>> (only the trailer has to be indirect) the file will open in Chrome's viewer without complaint at only 87 bytes.
  • Luke Rehmann
    Luke Rehmann almost 6 years
    Also opens under chrome, data:application/pdf;base64,JVBERi0xLjIgCjkgMCBvYmoKPDwKPj4K‌​c3RyZWFtCkJULyA5IFRm‌​KFRlc3QpJyBFVAplbmRz‌​dHJlYW0KZW5kb2JqCjQg‌​MCBvYmoKPDwKL1R5cGUg‌​L1BhZ2UKL1BhcmVudCA1‌​IDAgUgovQ29udGVudHMg‌​OSAwIFIKPj4KZW5kb2Jq‌​CjUgMCBvYmoKPDwKL0tp‌​ZHMgWzQgMCBSIF0KL0Nv‌​dW50IDEKL1R5cGUgL1Bh‌​Z2VzCi9NZWRpYUJveCBb‌​IDAgMCA5OSA5IF0KPj4K‌​ZW5kb2JqCjMgMCBvYmoK‌​PDwKL1BhZ2VzIDUgMCBS‌​Ci9UeXBlIC9DYXRhbG9n‌​Cj4+CmVuZG9iagp0cmFp‌​bGVyCjw8Ci9Sb290IDMg‌​MCBSCj4+CiUlRU9G
  • Luke Rehmann
    Luke Rehmann almost 6 years
    Will not open under in Chrome for me.
  • Luke Rehmann
    Luke Rehmann almost 6 years
    The 70 byte version will not open in the latest Chrome, but i didn't have trouble with the 138 byte version.
  • mkl
    mkl almost 4 years
    The OP asked for the smallest possible valid PDF file; yours is not valid according to the spec.
  • John Smith
    John Smith almost 4 years
    How do I add some random texts for testing purpose based on any of the versions above?
  • John Smith
    John Smith almost 4 years
    @yms Do you have any example?
  • mkl
    mkl over 3 years
    It is not technically valid, according to the specification (which is more important than a lecture) there are multiple issues, missing cross references, direct objects where indirect ones are expected, ...
  • Lubco
    Lubco over 2 years
    The correct base64 of @towi is JVBERi0xLjAKMSAwIG9iajw8L1BhZ2VzIDIgMCBSPj5lbmRvYmogMiAwIG9i‌​ajw8L0tpZHNbMyAwIFJd‌​L0NvdW50IDE+PmVuZG9i‌​aiAzIDAgb2JqPDwvTWVk‌​aWFCb3hbMCAwIDMgM10+‌​PmVuZG9iagp0cmFpbGVy‌​PDwvUm9vdCAxIDAgUj4+‌​Cg==