Merge PDF's with PDFTK with Bookmarks?

23,018

Solution 1

You can also merge multiple PDFs with Ghostscript. The big advantage of this route is that a solution is easily scriptable, and it does not require a real programming effort:

gswin32c.exe ^
          -dBATCH -dNOPAUSE ^
          -sDEVICE=pdfwrite ^
          -sOutputFile=merged.pdf ^
          [...more Ghostscript options as needed...] ^
          input1.pdf input2.pdf input3.pdf [....]

With Ghostscript you'll be able to pass pdfmark statements which can add a Table of Content as well as bookmarks for each additional source file going into the resulting PDF. For example:

gswin32c.exe ^
          -dBATCH -dNOPAUSE ^
          -sDEVICE=pdfwrite ^
          -sOutputFile=merged.pdf ^
          [...more Ghostscript options as needed...] ^
          file-with-pdfmarks-to-generate-a-ToC.ps ^
          -f input1.pdf input2.pdf input3.pdf [....]

or

gswin32c.exe ^
          -dBATCH -dNOPAUSE ^
          -sDEVICE=pdfwrite ^
          -sOutputFile=merged.pdf ^
          [...more Ghostscript options as needed...] ^
          file-with-pdfmarks-to-generate-a-ToC.ps ^
          -f input1.pdf ^
             input2.pdf ^ 
             input3.pdf [....]

For some introduction to the pdfmark topic, see also Thomas Merz's PDFmark Primer.


Edit:
I had wanted to give you an example for file-with-pdfmarks-to-generate-a-ToC.ps, but somehow forgot it. Here it is:

[/Page 1 /View [/XYZ null null null] /Title (File 1) /OUT pdfmark
[/Page 2 /View [/XYZ null null null] /Title (File 2) /OUT pdfmark
[/Page 3 /View [/XYZ null null null] /Title (File 3) /OUT pdfmark
[/Page 4 /View [/XYZ null null null] /Title (File 4) /OUT pdfmark 

This would create a ToC for the first 4 files == first 4 pages (since you guarantee your ingredient files are 1 page each for your merged output PDF).

  1. The [/XYZ null null null] part makes sure your page viewport and zoom level does not change from the current one when you follow the link. (You could say [/XYZ 222 111 2] to do this, if you want an arbitrary example.)
  2. The /Title (some string you want) thingie determines what text is in the ToC.

And, you could even add these parameters to the Ghostscript commandline directly:

gswin32c.exe ^
       -o merged.pdf ^
       [...more Ghostscript options as needed...] ^
       -c "[/Page 1 /View [/XYZ null null null] /Title (File 1) /OUT pdfmark" ^
       -c "[/Page 2 /View [/XYZ null null null] /Title (File 2) /OUT pdfmark" ^
       -c "[/Page 3 /View [/XYZ null null null] /Title (File 3) /OUT pdfmark" ^
       -c "[/Page 4 /View [/XYZ null null null] /Title (File 4) /OUT pdfmark" ^
       -f input1.pdf ^
          input2.pdf ^ 
          input3.pdf ^ 
          input4.pdf [....]



'nother Edit:

Oh, and by the way: Ghostscript does preserve the bookmarks when you use it to merge two PDF files into one -- pdftk.exe does not. Let's use the one generated by the command of my first edit (effectively concatenating 2 copies of the same file):

 gswin32c ^
    -sDEVICE=pdfwrite ^
    -o doublemerged.pdf ^
     merged.pdf ^
     merged.pdf

The file doublemerged.pdf will now have 2*4 = 8 bookmarks.

  • What's as expected: bookmarks 1, 2, 3, and 4 link to pages 1, 2, 3 and 4.
  • The problem is, that bookmarks 5, 6, 7 and 8 also link at pages 1, 2, 3 and 4.

The reason is, that the pre-existing bookmarks did address their link targets by absolute page numbers. To work around that (and bookmarks work in merged files), one would have to generate bookmarks which do point to link targets by named destinations (and make sure these are uniq across documents which are merged).

(This approach also works on linux, just use gs instead of gswin32c.)


Appendix

Above command line uses [...more Ghostscript options as needed...] as a place holder for more options.

If you do not use other options, Ghostscript will apply its built-in defaults for various parameters. However, this may give you results which may not to your liking. Since Ghostscript generates a completely new PDF based on the input, this means that some of the original objects may be changed. This is true for color spaces and for image compression levels.

How to apply parameters which leave the originally embedded images unchanged can be seen over at SuperUser: "Use Ghostscript, but tell it to not reprocess images".

Solution 2

I know there are other ways to do this already mentioned, but with pdftk you can take the merged pdf and add bookmarks to it by using the pdftk function dump_data to create a .info file of the existing info in the pdf. Then you can add bookmark info to the .info file by add the following four lines for each bookmark

BookmarkBegin
BookmarkTitle: name
BookmarkLevel: level
BookmarkPageNumber: page number

Then use the update_info call to update the merged pdf bookmarks with the ones you wrote to the .info file. I have written some simple functions that do this for me in autohotkey if anyone is interested. See http://www.autohotkey.com/board/topic/98985-scripts-to-merge-pdfs-and-add-bookmarks-with-pdftk/

Solution 3

See this answer at https://stackoverflow.com/a/17781138/547578. I used something called Sejda. It works. It combines the bookmarks perfectly. Thanks @blablatros.

Solution 4

Too add or edit pdf bookmarks you could use JPdfBookmarks. It is an excellent multi-OS Free Software tool that I have been using for a while now with excellent results. It deals with bookmarks only though, so you would need another tool to merge or reorder pages. In addition to pdftk I suggest trying PDF Split and Merge (good app, but weird UI, messes up bookmarks from my experience), PDF-Shuffler (seems to work fine, but sometimes freezes while dealing with some files), or PdfMod (the best potentially as it deals with rearranging, merging and dealing with bookmarks, although I have not been able to figure out how to add pdfs into a specific page).

Sorry for not providing some links, as a newbie the system only allows me to add 2 hyperlinks.

Solution 5

@pipitas 's good answer doesn't solve the bookmark issues perfected, and the there is related question in unix discussion https://unix.stackexchange.com/questions/17065/add-and-edit-bookmarks-to-pdf/31070 , where I suggest

If you still stick with those unix scripts, then

  1. extract bookmark data dumped from pdftk
  2. write one extra script to convert dumped bookmark data to pdfmarks format, which ghostscript command gs is accepted.
  3. use gs script to merge them together with pdfmarks

The script exist already, see pdf-merge.py from Merge PDF's with PDFTK with Bookmarks?

Share:
23,018
Jason
Author by

Jason

Updated on May 29, 2021

Comments

  • Jason
    Jason almost 3 years

    Using pdftk to merge multiple pdf's is working well. However, any easy way to make a bookmark for each pdf merged?

    I don't see anything on the pdftk docs regarding this so I don't think it's possible with pdftk.

    All of our files merged will be 1 page, so wondering if there's any other utility that can add in bookmarks afterwards?

    Or another linux based pdf utility that will allow to merge while specifying a bookmark for each individual pdf.

  • Larry Cai
    Larry Cai over 12 years
    this is exactly what I want, and linuxjournal.com/content/tech-tip-extract-pages-pdf gives more information on how to extract (like split) some pages as well.
  • Kurt Pfeifle
    Kurt Pfeifle over 12 years
    @larrycal: ...and guess who wrote that linuxjournal.com TechTip (which they published under his clear name)? ;-)
  • Joel Peltonen
    Joel Peltonen almost 11 years
    Looks interesting, do you know what the license is?
  • blablatros
    blablatros almost 11 years
    @fossilet No problem. For 1 page PDFs — i.e. if OP needs only one bookmark per input page — the PDF Split&Merge mentioned in the linked post will suffice too.
  • Joel Peltonen
    Joel Peltonen over 10 years
    Good addition, so the procedure would be 1) Grab bookmarks from files to be merged 2) Merge files and bookmarks (shifting pages) separately 3) Set merged and shifted bookmarks to merged PDF?
  • steventaitinger
    steventaitinger over 10 years
    Currently I am printing a combination of word files and autocad files to pdf and while I have the files open I am building a list of bookmarks. Then I merge all the pdf files. Then I create the .info file for the merged pdf. Then I edit the .info based on my list of bookmarks I have made previously. Then I update the merged pdf based on the .info file.
  • Dor
    Dor about 8 years
    @KurtPfeifle Works wonderful, though images' quality has worsened. See the following thread for the solution: superuser.com/a/373740/48622
  • Kurt Pfeifle
    Kurt Pfeifle about 8 years
    @Dor: I know this answer you linked to ;-)
  • jerik
    jerik over 7 years
    A full example, how to use the update-info can be found here: stackoverflow.com/a/40222656/1933185
  • acker9
    acker9 about 5 years
    With sejda, use the -b one_entry_each_doc option to add a bookmark entry for each file added to the merged PDF. The -t file_names option may also be of interest to you if you wish to add a generated table of contents page at the beginning of the merged PDF.
  • whiskeychief
    whiskeychief over 3 years
    This is very nice. PDFSam (FOSS) also provides a similar output, I'm wondering if they share the same code. Both are good alternatives.
  • Nemo
    Nemo almost 3 years
    This requires existing bookmarks in the original files.