Split PDF document from command line in Linux?

47,192

Solution 1

I find pdfseparate very convenient to split ranges into individual pages. This command would extract pages 1 - 5 of input.pdf into files named output-page1.pdf, output-page2.pdf, ...

pdfseparate -f 1 -l 5 input.pdf output-page%d.pdf

If you want to recombine them into page ranges, for example pages 1-3 in one document and pages 4-5 in another, you can use the companion program, pdfunite, as follows:

pdfunite output-page1.pdf output-page2.pdf output-page3.pdf final-pages1-3.pdf
pdfunite output-page4.pdf output-page5.pdf final-pages4-5.pdf

I believe theese tools are part of poppler and may already be installed on your system.

Solution 2

Using pdftk 2.02 worked for me on debian, but I think it should work for you too.

pdftk input.pdf cat 2-4 output out1.pdf

For a general case where you have to split a single pdf to multiple files I could not find a way with pdftk, so I'm using a Bash script.

Solution 3

I'll put this as an answer, so as not to clog the question: here is a related link on unix.se:

... and the accepted answer uses a Python script with PyPDF (but that answer implements a split of one page into two - and that script thus needs to be modified for page ranges, for it to work as asked in OP).

 

EDIT: I just found this: Stapler - A python utility for manipulating PDF docs based on pypdf (Page 3) / Community Contributions / Arch Linux Forums; which is, apparently "A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk" (note that the mailing list notes some problems with it, however)...

Solution 4

You can use the pdfjam tool with the syntax

pdfjam <input-file> <page-ranges> -o <output-file>

and an example of page ranges would be

3,67-70,80

Source: https://tex.stackexchange.com/questions/79623 by Vincent Nivoliers

Share:
47,192

Related videos on Youtube

tetram
Author by

tetram

Updated on September 18, 2022

Comments

  • tetram
    tetram over 1 year

    I would like to extract page ranges from a PDF document into a new PDF document using the command line in Linux. Note that:

    $ pdftk input.pdf cat 1 verbose output output.pdf
    Error: Failed to open PDF file: 
       input.pdf
    Errors encountered.  No output created.
    Done.  Input errors, so no output created.
    

    Turns out that "You (should) know that Pdftk is nothing more than a very old version of iText.... The keywords in the above statement are "VERY OLD"." (from pdftk can't open pdf file)

     

    $ java -classpath /path/to/Multivalent20091027.jar tool.pdf.Split -page 1 input.pdf
    Exception in thread "main" java.lang.NoClassDefFoundError: tool/pdf/Split
    Caused by: java.lang.ClassNotFoundException: tool.pdf.Split
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    Could not find the main class: tool.pdf.Split.  Program will exit.
    

    Turns out, this is a bit of a tricky software: even if its on SourceForge, and says that "Practical Thought generously provides these tools for free use on the command line" here - however, here then it says: "The browser is open source. The document tools are a free bonus and not open source." ... which finally clarifies the comment from conversion - Gluing (Imposition) PDF documents - Stack Overflow:

    All releases of Multivalent linked from the official sourceforge site are missing the tools package.

    (edit: there seems to be an old Multivalent version with the tools included, see the SO link; but as it looks somewhat like abandonware, I'd rather not use it)

     

    • Finally, I'd like to avoid tools that are essentially front-ends for Latex like PDFjam

     

    So, are there any options for such a pdf-splitting command line tool under Linux?

    • Matthias Braun
      Matthias Braun over 2 years
      Qpdf can split PDFs. For example, to split a PDF into groups of two pages, do: qpdf --split-pages=2 in.pdf out-%d.pdf, see this answer for more. To extract a range of pages, 2 to 5 in this example: qpdf --empty --pages in.pdf 2-5 -- out.pdf, see also this.
  • Ok Letsdothis
    Ok Letsdothis over 3 years
    Great solution, but: The resulting file size of the split results can sometimes be identical to the whole file. A solution to this is found in this comment. In my experience, a pdf file created with pdflatex that contained many images using the \includepdf{} command have caused this problem. The solution in the linked comment works great.
  • bballdave025
    bballdave025 almost 3 years
    Quote from the comment linked by @Ok_Letsdothis. << [F]irst, "optimiz[e]" the PDF with Ghostscript: gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=doc-compressed.pdf doc.pdf, after which pdfseparate lead [sic] to the expected size reduction. >>