Split PDF document from command line in Linux?
Solution 1
I find pdfseparate very convenient to split ranges into individual pages. This command would extract pages 1 - 5 of input.pdf
into files named output-page1.pdf
, output-page2.pdf
, ...
pdfseparate -f 1 -l 5 input.pdf output-page%d.pdf
If you want to recombine them into page ranges, for example pages 1-3 in one document and pages 4-5 in another, you can use the companion program, pdfunite, as follows:
pdfunite output-page1.pdf output-page2.pdf output-page3.pdf final-pages1-3.pdf
pdfunite output-page4.pdf output-page5.pdf final-pages4-5.pdf
I believe theese tools are part of poppler and may already be installed on your system.
Solution 2
Using pdftk 2.02 worked for me on debian, but I think it should work for you too.
pdftk input.pdf cat 2-4 output out1.pdf
For a general case where you have to split a single pdf to multiple files I could not find a way with pdftk, so I'm using a Bash script.
Solution 3
I'll put this as an answer, so as not to clog the question: here is a related link on unix.se:
... and the accepted answer uses a Python script with PyPDF (but that answer implements a split of one page into two - and that script thus needs to be modified for page ranges, for it to work as asked in OP).
EDIT: I just found this: Stapler - A python utility for manipulating PDF docs based on pypdf (Page 3) / Community Contributions / Arch Linux Forums; which is, apparently "A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk" (note that the mailing list notes some problems with it, however)...
- there is also PDF-Shuffler based on pypdf; but that one is GUI only - does not have a command line mode...
Solution 4
You can use the pdfjam
tool with the syntax
pdfjam <input-file> <page-ranges> -o <output-file>
and an example of page ranges would be
3,67-70,80
Source: https://tex.stackexchange.com/questions/79623 by Vincent Nivoliers
Related videos on Youtube
tetram
Updated on September 18, 2022Comments
-
tetram over 1 year
I would like to extract page ranges from a PDF document into a new PDF document using the command line in Linux. Note that:
- Pdftk - The PDF Toolkit fails for me with:
$ pdftk input.pdf cat 1 verbose output output.pdf Error: Failed to open PDF file: input.pdf Errors encountered. No output created. Done. Input errors, so no output created.
Turns out that "You (should) know that Pdftk is nothing more than a very old version of iText.... The keywords in the above statement are "VERY OLD"." (from pdftk can't open pdf file)
- Multivalent also fails:
$ java -classpath /path/to/Multivalent20091027.jar tool.pdf.Split -page 1 input.pdf Exception in thread "main" java.lang.NoClassDefFoundError: tool/pdf/Split Caused by: java.lang.ClassNotFoundException: tool.pdf.Split at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: tool.pdf.Split. Program will exit.
Turns out, this is a bit of a tricky software: even if its on SourceForge, and says that "Practical Thought generously provides these tools for free use on the command line" here - however, here then it says: "The browser is open source. The document tools are a free bonus and not open source." ... which finally clarifies the comment from conversion - Gluing (Imposition) PDF documents - Stack Overflow:
All releases of Multivalent linked from the official sourceforge site are missing the tools package.
(edit: there seems to be an old Multivalent version with the tools included, see the SO link; but as it looks somewhat like abandonware, I'd rather not use it)
- Finally, I'd like to avoid tools that are essentially front-ends for Latex like PDFjam
So, are there any options for such a pdf-splitting command line tool under Linux?
-
Matthias Braun over 2 yearsQpdf can split PDFs. For example, to split a PDF into groups of two pages, do:
qpdf --split-pages=2 in.pdf out-%d.pdf
, see this answer for more. To extract a range of pages, 2 to 5 in this example:qpdf --empty --pages in.pdf 2-5 -- out.pdf
, see also this.
-
Ok Letsdothis over 3 yearsGreat solution, but: The resulting file size of the split results can sometimes be identical to the whole file. A solution to this is found in this comment. In my experience, a pdf file created with pdflatex that contained many images using the \includepdf{} command have caused this problem. The solution in the linked comment works great.
-
bballdave025 almost 3 yearsQuote from the comment linked by @Ok_Letsdothis. << [F]irst, "optimiz[e]" the PDF with Ghostscript:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=doc-compressed.pdf doc.pdf
, after whichpdfseparate
lead [sic] to the expected size reduction. >>