How to merge pdfs and create bookmarks for each input file in output file? (linux)
Solution 1
UPDATE: I wasn't satisfied with the result and have written this with nice GUI:
https://github.com/Yanpas/PdfMerger
Learned python and has written (modified) program in one hour:
#! /usr/bin/env python
# Original author Nicholas Kim, modified by Yan Pashkovsky
# New license - GPL v3
import sys
import time
from PyPDF2 import utils, PdfFileReader, PdfFileWriter
def get_cmdline_arguments():
"""Retrieve command line arguments."""
from optparse import OptionParser
usage_string = "%prog [-o output_name] file1, file2 [, ...]"
parser = OptionParser(usage_string)
parser.add_option(
"-o", "--output",
dest="output_filename",
default=time.strftime("output_%Y%m%d_%H%M%S"),
help="specify output filename (exclude .pdf extension); default is current date/time stamp"
)
options, args = parser.parse_args()
if len(args) < 2:
parser.print_help()
sys.exit(1)
return options, args
def main():
options, filenames = get_cmdline_arguments()
output_pdf_name = options.output_filename + ".pdf"
files_to_merge = []
# get PDF files
for f in filenames:
try:
next_pdf_file = PdfFileReader(open(f, "rb"))
except(utils.PdfReadError):
print >>sys.stderr, "%s is not a valid PDF file." % f
sys.exit(1)
except(IOError):
print >>sys.stderr, "%s could not be found." % f
sys.exit(1)
else:
files_to_merge.append(next_pdf_file)
# merge page by page
output_pdf_stream = PdfFileWriter()
j=0
k=0
for f in files_to_merge:
for i in range(f.numPages):
output_pdf_stream.addPage(f.getPage(i))
if i==0:
output_pdf_stream.addBookmark(str(filenames[k]),j)
j = j + 1
k += 1
# create output pdf file
try:
output_pdf_file = open(output_pdf_name, "wb")
output_pdf_stream.write(output_pdf_file)
finally:
output_pdf_file.close()
print "%s successfully created." % output_pdf_name
if __name__ == "__main__":
main()
This program requires PyPDF2, you can install it via sudo pip install pypdf2
, before this you need to install pip :)
Just open terminal and enter ./pdfmerger.py *.pdf
Solution 2
This Bash script will make each PDF in a directory contain one bookmark to its first page with the text of the PDF's filename, and then it will concatenate them all. It can handle Non-ASCII filename.
#!/usr/bin/bash
cattedPDFname="${1:?Concatenated PDF filename}"
# make each PDF contain a single bookmark to first page
tempPDF=`mktemp`
for i in *.pdf
do
bookmarkTitle=`basename "$i" .pdf`
bookmarkInfo="BookmarkBegin\nBookmarkTitle: $bookmarkTitle\nBookmarkLevel: 1\nBookmarkPageNumber: 1"
pdftk "$i" update_info_utf8 <(echo -en $bookmarkInfo) output $tempPDF verbose
mv $tempPDF "$i"
done
# concatenate the PDFs
pdftk *.pdf cat output "$cattedPDFname" verbose
Solution 3
Modifying a good answer [1] of tex.stackexchange.com, you can create an itemize
list with the reference to the files that you will include below. (Similarly to a toc). Latex will take care to update the page numbers.
Some Latex words more
-
A line as this will include the PDF file
MyDoc1.pdf
with the reference name "doc01" present in the same directory of the latex file:\modifiedincludepdf{-}{doc01}{MyDoc1.pdf}
A command as
\pageref{doc02.3}
will create a link with the number of the third page of the document that has for reference the key "doc02". Latex will take care to keep it updated.A block
\begin{itemize}
\end{itemize}
will create a pointed list.
The latex file
Here below the modified template that will work with pdflatex
:
\documentclass{article}
\usepackage{hyperref}
\usepackage{pdfpages}
\usepackage[russian,english]{babel}
\newcounter{includepdfpage}
\newcounter{currentpagecounter}
\newcommand{\addlabelstoallincludedpages}[1]{
\refstepcounter{includepdfpage}
\stepcounter{currentpagecounter}
\label{#1.\thecurrentpagecounter}}
\newcommand{\modifiedincludepdf}[3]{
\setcounter{currentpagecounter}{0}
\includepdf[pages=#1,pagecommand=\addlabelstoallincludedpages{#2}]{#3}}
\begin{document}
You can refer to the beginning or to a specific page: \\
see page \pageref{doc01.1} till \pageref{doc02.3}.\\
\begin{itemize}
\item Here contribution from Grupmate 1 \pageref{doc01.1}
\item Here contribution from Grupmate 2 \pageref{doc02.1}
\end{itemize}
\modifiedincludepdf{-}{doc01}{MyDoc1.pdf}
\modifiedincludepdf{-}{doc02}{MyDoc2.pdf}
\end{document}
Note
To simply merge and split PDF documents or pages you can use tools as pdftk and take inspiration from other questions [3] about it.
References
Related videos on Youtube
yanpas
Updated on September 18, 2022Comments
-
yanpas over 1 year
I'm using Linux and I would like to have software (or script, method) which merges some pdfs and creates an united output pdf, containing bookmarks. Bookmarks are named by filename of pdf files, which were used for merging and pointing to the page number, where these files begin.
Similar possibilities have Adobe Acrobat, but it is non-free and Windows-only.
-
Hastur over 8 yearsIn okular you can put bookmarks in each part of a pdf and they will be shown in a column of bookmarks, regardless if the file is open or not. Then you click and... It's not what you are searching for but it could work. To physically merge more pdf in only one you can use latex... BTW your question it will be probably closed because the software suggestion are off topic. It should be different if you were trying to do a script that
find
s all the pdf with their location, splitbasename
anddirname
and put all in a tex container to be compiled to have your file and you stop somewhere. ;) -
NZD over 8 yearsHave a look at unix.stackexchange.com/q/17065/121614
-
yanpas over 8 years@Hastur well gs script would be OK for this purpose ) I do not have source files, only pdfs, so I do not understand how latex can help
-
Hastur over 8 years@yanpas: I didn't understand well: do you want to create, let we say, a book with included a bunch of pdf files and with an index in the beginning (or in the ending) with hyperlinks to the page from which each article starts in the book, or do you want to create an index with link that points to the file on the HDD? I suppose the 1st. Can you confirm it?
-
yanpas over 8 years@Hastur the answer is closer to the first. Me and my groupmates are preparing about 100 questions to the exam, each of us is doing his own part in editor he prefers and send me his result in pdf format. Then I merge all pdfs to output.pdf. For easier navigation I would like outer.pdf to have a bookmark list (when i clikc on this list - I am moved to the section of document which is related to the bunch of answers. Something like i.imgur.com/hQQwp6i.png
-
Hastur over 8 years@yanpas Feel you free to add the packages you need and modify it for your purpose :) I tested it works on my system. Let me know.
-
Xen2050 over 8 yearsWhy not all just use the same file format, one that's better suited to editing, cut&paste? Like ODF (Libreoffice), Word, etc? Or, if each person can't be bothered to use the same program, then you open each file in it's own format, then cut & paste into your favourite one?
-
yanpas over 8 years@Xen2050 I've described only one case, sometimes a have nothing but pdfs from internet and I still need strcuture in final pdf
-
-
Nathan over 3 yearsThanks for this! It's a very useful little script.
-
Ur Ya'ar about 3 yearsMy files are named "Lecture_#.pdf" where # is a number, and it does what you intend but the order is not right - instead of going 1,2,3,... it goes 10,11,12....1,20,21,.... can this be fixed?
-
Ur Ya'ar about 3 yearsThis is exactly what I'm looking for! Could you add more detailed installation instructions?
-
Ur Ya'ar about 3 yearsThis also happens when I just want te merge in pdftk, so to get the correct order I use Lecture_{1..27}.pdf instead of Lecture_*.pdf. but I know the exact name and number of files...
-
James Wright over 2 yearsI updated the python script to be 3.X compatible and put it in the following gist.