Create bookmarks into a PDF file via command line

11,014

Solution 1

You can also use pdftk. It is also available for OS X.

I'm not going through all the details here and now, because it's been done elsewhere at great length already. Just briefly:

  1. Create a sample PDF from your original files (without bookmarks).
  2. Add some bookmarks with Adobe Acrobat (which you seem to have access to).
  3. Run one of these commands:

    pdftk my.pdf dump_data output -
    pdftk my.pdf dump_data output bookmarks+otherdata.txt
    
  4. Study the format of the output.

  5. Modify the output .txt file by adding all the entries you want.
  6. Run PDFTK again:

    pdftk my.pdf update_info bookmarks.txt output bookmarked.pdf
    

Additional Information

This is the Bookmark format I noticed after inspecting in Step 4 above.

BookmarkBegin
BookmarkTitle: -- Your Title 1 --
BookmarkLevel: 1
BookmarkPageNumber: 1
BookmarkBegin
BookmarkTitle: -- Your Title 2 --
BookmarkLevel: 1
BookmarkPageNumber: 2
BookmarkBegin
BookmarkTitle: -- Your Title 3 --
...
...
and so on...

And replace the above.. in the appropriate place.

Solution 2

Here is another answer. This one uses Ghostscript to process PDF-to-PDF and the pdfmark PostScript operator to insert the bookmarks.

For some introduction to the pdfmark topic, see also:

This method involves two steps:

  1. Create a text file (a PostScript file, really), with a limited set of pdfmark commands, one per line and bookmark you want to add.
  2. Run Ghostscript command that processes your current PDF file alongside the text file.

1.

The content on the text file should look something like this:

[/Page 1   /View [/XYZ null null null] /Title (This is page 1)         /OUT pdfmark
[/Page 2   /View [/XYZ null null null] /Title (Dunno which page this is....) /OUT pdfmark
[/Page 3   /View [/XYZ null null null] /Title (Some other name)        /OUT pdfmark
[/Page 4   /View [/XYZ null null null] /Title (File 4)                 /OUT pdfmark
[/Page 5   /View [/XYZ null null null] /Title (File 5)                 /OUT pdfmark
[/Page 6   /View [/XYZ null null null] /Title (File 6)                 /OUT pdfmark
[/Page 7   /View [/XYZ null null null] /Title (File 7)                 /OUT pdfmark
% more lines for more pages to bookmark...
[/Page 13  /View [/XYZ null null null] /Title (File 13)                /OUT pdfmark
[/Page 14  /View [/XYZ null null null] /Title (Bookmark for page 14)   /OUT pdfmark
% more lines for more pages to bookmark...

Name this file for example: addmybookmarks.txt

2.

Now run this command:

gs -o bookmarked.pdf   \
   -sDEVICE=pdfwrite   \
    addmybookmarks.txt \
   -f original.pdf

The resulting PDF, bookmarked.pdf now contains the bookmarks. See this screenshot:

Screenshot of bookmarks added with the help of Ghostscript and <code>pdfmark</code>

Solution 3

Ok, here is a quick'n'dirty way to do three jobs at once:

  1. Merge your 400 single-page PDFs.
  2. Create a document top level ToC (Table of Contents).
  3. Create a PDF bookmark for each page.

It involves using a LaTeX installation.

You start with an empty LaTeX template like the following one:

\documentclass[]{article}
\usepackage{pdfpages}
\usepackage{hyperref}
    \hypersetup{breaklinks=true,
                bookmarks=true,
                pdfauthor={},
                pdftitle={},
                colorlinks=true,
                citecolor=blue,
                urlcolor=blue,
                linkcolor=magenta,
                pdfborder={0 0 0}}
\begin{document}

{
    \hypersetup{linkcolor=black}
    \setcounter{tocdepth}{3}
    % Comment next line in or out if you want a ToC or not:
    \tableofcontents
}

%% Here goes your additional code:
%% 1 line per included PDF!

\end{document}

Now just before the last line of this template, you insert one line per external PDF file you want to include.

  1. In case you want to generate a ToC, it has to be formatted like this:

    \includepdf[pages={<pagenumber>},addtotoc{<pagenumber>,<section>,<level>,\
                       <heading>,<label>}]{pdffilename.pdf}
    
  2. In case you are sure that each and every included PDF is a 1-page document, it simplifies to this:

    \includepdf[addtotoc{<pagenumber>,<section>,<level>,\
                         <heading>,<label>}]]{pdffilename.pdf}
    

Here all of the following five parameters for addtotoc are required, in the order given for the files to appear in the bookmarks and in the ToC. See further below for a specific example:

  • <pagenumber> : Number of the page of inserted document to be linked to. (In your case always "1", because you insert 1-page documents only; you could insert a 5-page document and link to page 3 of the inserted PDF, though).
  • <section> : The LaTeX sectioning name. Could be section, subsection, subsubsection... In your case "section".
  • <level> : The level of the LaTeX section. In your case "1".
  • <heading> : This is a string. Used for the text of the bookmark
  • <label> : This must be unique for each bookmark. Used in the PDF internally to jump to correct page when bookmark is clicked.

To test this quickly, I used Ghostscript to generate 20 1-page PDF documents:

for i in {1..20}; do
   gs -o p${i}.pdf -sDEVICE=pdfwrite               \
      -c "/Helvetica findfont 30 scalefont setfont \
          100 600 moveto                           \
          (Page ${i}) show                         \
          showpage"; 
done

With these test files I could make the lines to insert into the template look like these:

\includepdf[addtotoc={1,section,1,Page 1 (First),p1}]{p1.pdf}
\includepdf[addtotoc={1,section,1,Page 2,p2}]{p2.pdf}
\includepdf[addtotoc={1,section,1,Page 3,p3}]{p3.pdf}
[...]
\includepdf[addtotoc={1,section,1,Page 11 (In the Middle),p11}]{p11.pdf}
[...]
\includepdf[addtotoc={1,section,1,Page 20 (Last),p20}]{p20.pdf}

Save the template with the inserted lines, then run the following command twice:

 pdflatex template.tex
 pdflatex template.tex

The resulting file will have the bookmarks, looking like this in Preview.app:

Screenshot: Preview.app with the bookmarks opened


Note: LaTeX is available for OSX via two methods:


I'll add one or two other methods to insert bookmarks on the command line too, later or in the next few days, if I have more time.

For now this one has to do, because I never showed it here on SO, AFAICR.

But I thought because you gave the background "I'm merging 1-page PDFs, and it is slow; now I want to add bookmarks too...", I could show how to do it with one single method.

HINT : One of the other methods will be to use pdftk which IS available for Mac OS X!

Solution 4

Here's the python method for adding Bookmarks to the Table of Contents. Runs on MacOS without any other installations.

#!/usr/bin/python    
from Foundation import  NSURL, NSString
import Quartz as Quartz
import sys

# You will need to change these filepaths to a local test pdf and an output file.
infile = "/path/to/file.pdf"
outfile = "/path/to/output.pdf"

def getOutline(page, label):
    # Create Destination
    myPage = myPDF.pageAtIndex_(page)
    pageSize = myPage.boundsForBox_(Quartz.kCGPDFMediaBox)
    x = 0
    y = Quartz.CGRectGetMaxY(pageSize)
    pagePoint = Quartz.CGPointMake(x,y)
    myDestination = Quartz.PDFDestination.alloc().initWithPage_atPoint_(myPage, pagePoint)
    myLabel = NSString.stringWithString_(label)
    myOutline = Quartz.PDFOutline.alloc().init()
    myOutline.setLabel_(myLabel)
    myOutline.setDestination_(myDestination)
    return myOutline

pdfURL = NSURL.fileURLWithPath_(infile)
myPDF = Quartz.PDFDocument.alloc().initWithURL_(pdfURL)
if myPDF:
    # Here's where you list your page index (starts at 0) and label.
    outline1 = getOutline(0, 'Page 1')
    outline2 = getOutline(1, 'Page 2')
    outline3 = getOutline(2, 'Page 3')

    # Create a root Outline and add each outline. (Needs a loop.)
    rootOutline = Quartz.PDFOutline.alloc().init()
    rootOutline.insertChild_atIndex_(outline1, 0)
    rootOutline.insertChild_atIndex_(outline2, 1)
    rootOutline.insertChild_atIndex_(outline3, 2)
    myPDF.setOutlineRoot_(rootOutline)
    myPDF.writeToFile_(outfile)
Share:
11,014

Related videos on Youtube

drmariod
Author by

drmariod

Updated on June 25, 2022

Comments

  • drmariod
    drmariod almost 2 years

    I am searching for a command line tool to add bookmarks to a PDF file.

    What I have is a page number and a label. Would love to create bookmark called label linking to page page number.

    Does any one know a command line tool (preferably OSX) for doing this?

    I have about 4000 pages PDF files and about 150 bookmarks and would love to automate it.

    My plan is to use a system call within a r-script.

    EDIT

    I create about 4000 single PDF files with graphs and I am using the OSX system command /System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py to join the PDFs together. Previously I was using pdfjoin from pdfjam package, but this was way too slow. In the end, this is how I get my PDF where I add the bookmarks by hand with Adobe Acrobat Professional at the moment.

    • Roland
      Roland almost 9 years
      I'm not sure if you have code to produce a PDF or only the PDF file itself. If the former, we'd need much more details.
    • drmariod
      drmariod almost 9 years
      Thanks @Roland, I added tome Information.
    • hrbrmstr
      hrbrmstr almost 9 years
      pdftk can update/insert bookmarks. Also, here's a potentially helpful thread on stats.SO
    • drmariod
      drmariod almost 9 years
      pdftk is only available for Windows, so it will not fit my needs. Thanks anyways
    • Kurt Pfeifle
      Kurt Pfeifle almost 9 years
      "pdftk is only available for Windows..." Not true! See my answer. It includes a link to directly download an OSX .pkg installer (from the original pdftk-vendor, not from some rubbish third party provider)...
  • drmariod
    drmariod almost 9 years
    Actually, I really like this solution, a very clear syntax and very easy to script... Thanks
  • drmariod
    drmariod almost 9 years
    I don't get the addtotoc command running... It says the command is not found, but it can find the pdfpages package... So I don't understand the problem here :-(
  • Kurt Pfeifle
    Kurt Pfeifle almost 9 years
    @dmariod: Without seeing your code I can't say what's wrong with it. Maybe s.th. very simple which the same eyes that stared on the line(s) while writing it can't recognize any more, but "third party" eyes easily can... Happened to me also, and not just once :)
  • Kurt Pfeifle
    Kurt Pfeifle almost 9 years
    @drmariod: Actually, I like this solution least of all the three :)
  • ihightower
    ihightower over 7 years
    i like this solution the best and the easiest i can understand.. as i have pdftk already.
  • Shamaoke
    Shamaoke over 2 years
    Use dump_data_utf8 and update_info_utf8 in order to properly display characters in scripts other than Latin (e. g. Japanese).