How to generate plain-text source-code PDF examples that work in a document viewer?
Solution 1
You should append a (syntactically correct) xref
and trailer
section to the end of the file. That means: each object in your PDF needs one line in the xref table, even if the byte offset isn't correctly stated. Then Ghostscript, pdftk or qpdf can re-establish a correct xref and render the file:
[...]
endobj
xref
0 8
0000000000 65535 f
0000000010 00000 n
0000000020 00000 n
0000000030 00000 n
0000000040 00000 n
0000000050 00000 n
0000000060 00000 n
0000000070 00000 n
trailer
<</Size 8/Root 1 0 R>>
startxref
555
%%EOF
Solution 2
Ah damn it - I had copied just a part of the code; the OP code is the one on pg 701 - then there is a footer which confused me; otherwise the code continues on pg 702 :/
(EDIT: also see Introduction to PDF - GNUpdf (archive) for a similar, more detailed example)
So here is the complete code:
%PDF-1.4
1 0 obj
<< /Type /Catalog
/Outlines 2 0 R
/Pages 3 0 R
>>
endobj
2 0 obj
<< /Type /Outlines
/Count 0
>>
endobj
3 0 obj
<< /Type /Pages
/Kids [ 4 0 R ]
/Count 1
>>
endobj
4 0 obj
<< /Type /Page
/Parent 3 0 R
/MediaBox [ 0 0 612 792 ]
/Contents 5 0 R
/Resources << /ProcSet 6 0 R
/Font << /F1 7 0 R >>
>>
>>
endobj
5 0 obj
<< /Length 73 >>
stream
BT
/F1 24 Tf
100 100 Td
( Hello World ) Tj
ET
endstream
endobj
6 0 obj
[ /PDF /Text ]
endobj
7 0 obj
<< /Type /Font
/Subtype /Type1
/Name /F1
/BaseFont /Helvetica
/Encoding /MacRomanEncoding
>>
endobj
xref
0 8
0000000000 65535 f
0000000009 00000 n
0000000074 00000 n
0000000120 00000 n
0000000179 00000 n
0000000364 00000 n
0000000466 00000 n
0000000496 00000 n
trailer
<< /Size 8
/Root 1 0 R
>>
startxref
625
%%EOF
Indeed, as the error messages were saying, xref section was missing!
However, this is still not the end - while this document will open in evince
, evince will still complain:
$ evince hello.pdf
Error: PDF file is damaged - attempting to reconstruct xref table...
... and so will qpdf
:
$ qpdf --check hello.pdf
WARNING: hello.pdf: file is damaged
WARNING: hello.pdf (file position 625): xref not found
WARNING: hello.pdf: Attempting to reconstruct cross-reference table
checking hello.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
WARNING: hello.pdf (object 5 0, file position 436): attempting to recover stream length
So to actually get a proper example, as the Adobe Forums: Simple Text String Example in specification broken. points out, xref table needs to be reconstructed (have correct byte offsets).
And in order to do this, we can use pdftk
to "Repair a PDF's Corrupted XREF Table and Stream Lengths (If Possible)":
$ pdftk hello.pdf output hello_repair.pdf
... and now hello_repair.pdf
opens in evince
without a problem - and qpdf
reports:
$ qpdf --check hello_repair.pdf
checking hello_repair.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
No errors found
Well, hope this helps someone,
Cheers!
sdaau
Updated on June 30, 2022Comments
-
sdaau almost 2 years
I just found the post Adobe Forums: Simple Text String Example in specification broken., so I got interested in finding plain-text source code PDF examples.
So, through that post, I eventually found:
- The webpage PDF Reference and Adobe Extensions to the PDF Specification | Adobe Developer Connection ; which contains:
The PDF 1.7 spec has on page 699 appendix "_Annex H (informative) Example PDF files"; and from there, I wanted to try "H.3 Simple Text String Example" (the "classic Hello World").
So I tried to save this as
hello.pdf
(_except note when you copy from the PDF32000_2008.pdf, you may get "%PDF-1. 4
" - that is, a space inserted after1.
, which must be removed_) :%PDF-1.4 1 0 obj << /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >> endobj 2 0 obj << /Type /Outlines /Count 0 >> endobj 3 0 obj << /Type /Pages /Kids [ 4 0 R ] /Count 1 >> endobj 4 0 obj << /Type /Page /Parent 3 0 R /MediaBox [ 0 0 612 792 ] /Contents 5 0 R /Resources << /ProcSet 6 0 R /Font << /F1 7 0 R >> >> >> endobj 5 0 obj << /Length 73 >> stream BT /F1 24 Tf 100 100 Td ( Hello World ) Tj ET endstream endobj
... and I'm trying to open it:
evince hello.pdf
... however, evince cannot open it: "Unable to open document / PDF document is damaged"; and also:
Error: PDF file is damaged - attempting to reconstruct xref table... Error: Couldn't find trailer dictionary Error: Couldn't read xref table
I also check with
qpdf
:$ qpdf --check hello.pdf WARNING: hello.pdf: file is damaged WARNING: hello.pdf: can't find startxref WARNING: hello.pdf: Attempting to reconstruct cross-reference table hello.pdf: unable to find trailer dictionary while recovering damaged file
Where am I going wrong with this?
Many thanks in advance for any answers,
Cheers! -
sdaau almost 12 yearsIndeed - thanks for that @pipitas; I had also realized the same and documented in this post; cheers!
-
yms about 9 years@KurtPfeifle I have found that for simple PDF files (with no compressed streams for example) just putting consecutive object ids, skipping the xref table and putting startxref 0 %%EOF works as a charm for acrobat reader to open the file and generate a new table.
-
Kurt Pfeifle about 9 years@yms: Yes, but the "saved" PDF may contain code that is and looks completely different from your original PDF source code. So it is not an option for cases where you want to write PDF files that serve as learning/teaching/studying material.