Splitting a PDF with Ghostscript

41,364

Solution 1

What you see is "normal" behaviour: the current version of Ghostscript's pdfwrite output device does not support this feature. This is also (admittedly, somehow vaguely) documented in Use.htm:

"Note, however that the one page per file feature may not be supported by all devices...."

I seem to remember that one of the Ghostscript developers mentioned on IRC that they may add this feature to pdfwrite in some future release, but it seems to necessitate some major code rewrite, which is why they haven't done it yet...


Update: As Gordon's comment already hinted at, as of version 9.06 (released on July 31st, 2012), Ghostscript now supports the commandline as quoted in the question also for pdfwrite. (Gordon must have discovered the unofficial support for this already in 9.05, or he compiled his own executable from the pre-release sources which were not yet tagged as 9.06).

Solution 2

I found this script wriiten by Mr Weimer super useful:

#!/bin/sh
#
# pdfsplit [input.pdf] [first_page] [last_page] [output.pdf] 
#
# Example: pdfsplit big_file.pdf 10 20 pages_ten_to_twenty.pdf
#
# written by: Westley Weimer, Wed Mar 19 17:58:09 EDT 2008
#
# The trick: ghostscript (gs) will do PDF splitting for you, it's just not
# obvious and the required defines are not listed in the manual page. 

if [ $# -lt 4 ] 
then
        echo "Usage: pdfsplit input.pdf first_page last_page output.pdf"
        exit 1
fi
gs -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$4" -dFirstPage=$2 -dLastPage=$3 -sDEVICE=pdfwrite "$1"

Origin from : http://www.cs.virginia.edu/~weimer/pdfsplit/pdfsplit

save it as pdfsplit.sh, see the magic happens.

PDFSAM also could do the job. Available on Windows and Mac.

Solution 3

Here is a script for Windows command prompt (working also with drag and drop) assuming you have Ghostscript installed:

@echo off
chcp 65001
setlocal enabledelayedexpansion

rem Customize or remove this line if you already have Ghostscript folders in your system PATH
set path=C:\Program Files\gs\gs9.22\lib;C:\Program Files\gs\gs9.22\bin;%path%

:start

echo Splitting "%~n1%~x1" into standalone single pages...
cd %~d1%~p1
rem getting number of pages of PDF with GhostScript
for /f "usebackq delims=" %%a in (`gswin64c -q -dNODISPLAY -c "(%~n1%~x1) (r) file runpdfbegin pdfpagecount = quit"`) do set "numpages=%%a"

for /L %%n in (1,1,%numpages%) do (
echo Extracting page %%n of %numpages%...
set "x=00%%n"
set "x=!x:~-3!"
gswin64c.exe -dNumRenderingThreads=2 -dBATCH -dNOPAUSE -dQUIET -dFirstPage=%%n -dLastPage=%%n -sDEVICE=pdfwrite -sOutputFile="%~d1%~p1%~n1-!x!.pdf" "%1"
)

shift
if NOT x%1==x goto start

pause

Name this script something like split PDF.bat and put it on your desktop. Drag and drop one (or even more) multipage PDF on it and it will create one standalone PDF file for each page of your PDF, appending the suffix -001, -002 and so on to the name to distinguish the pages.

You might need to customize (with relevant Ghostscript version) or remove the set path=... line if you already have Ghostscript folders in your system PATH environment variable.

It works for me under Windows 10 with Ghostscript 9.22. See comments to make it work with Ghostscript 9.50+.

Enjoy.

Solution 4

 #!/bin/bash
#where $1 is the input filename

ournum=`gs -q -dNODISPLAY -c "("$1") (r) file runpdfbegin pdfpagecount = quit" 2>/dev/null`
echo "Processing $ournum pages"
counter=1
while [ $counter -le $ournum ] ; do
    newname=`echo $1 | sed -e s/\.pdf//g`
    reallynewname=$newname-$counter.pdf
    counterplus=$((counter+1))
    # make the individual pdf page
    yes | gs -dBATCH -sOutputFile="$reallynewname" -dFirstPage=$counter -dLastPage=$counter -sDEVICE=pdfwrite "$1" >& /dev/null
    counter=$counterplus
done

Solution 5

Here's a simple python script which does it:

#!/usr/bin/python3

import os

number_of_pages = 68
input_pdf = "abstracts_rev09.pdf"

for i in range(1, number_of_pages +1):
    os.system("gs -q -dBATCH -dNOPAUSE -sOutputFile=page{page:04d}.pdf"
              " -dFirstPage={page} -dLastPage={page}"
              " -sDEVICE=pdfwrite {input_pdf}"
              .format(page=i, input_pdf=input_pdf))
Share:
41,364
zseder
Author by

zseder

Updated on December 22, 2021

Comments

  • zseder
    zseder over 2 years

    I try to split a multipage PDF with Ghostscript, and I found the same solution on more sites and even on ghostscript.com, namely:

    gs -sDEVICE=pdfwrite -dSAFER -o outname.%d.pdf input.pdf
    

    But it seems not working for me, because it produces one file, with all pages, and with the name outname.1.pdf.

    When I add the start and end pages, then it is working fine, but I want it to work without knowing those parameters.

    In the gs-devel archive, I found a solution for this: http://ghostscript.com/pipermail/gs-devel/2009-April/008310.html -- but I feel like doing it without pdf_info.

    When I use a different device, for example pswrite, but same parameters, it works correctly, producing as many ps files, as my input.pdf contains.

    Is this normal when using pdfwrite? Am I doing something wrong?

  • zseder
    zseder about 12 years
    Yeah, I read this line, but my phrase "normal behaviour" wants to mean that "is pdfwrite one of those who may not support this feature?" Your remembering of this IRC is okay for me, Thank you.
  • Gordon
    Gordon almost 12 years
    For people finding this answer in searches: As of 9.05, one-page-per-file works for me with the OP's command.
  • Kurt Pfeifle
    Kurt Pfeifle over 11 years
    @Gordon: Support for the -o out_%d.pdf syntax (to split multipage PDF into individual files per page) became official in 9.06. I hinted at this already in other answers (f.e. Split multi page PDF file into single pages). I forgot to update this answer. Thanks for the hint.
  • Wok
    Wok over 11 years
    Amazing. I don't have pdftk and psselect would lose some pdf quality, but not this.
  • Gus Neves
    Gus Neves over 5 years
    +1 for getting the page count with GS, good job! If anyone wants to get the page count on linux/macOS, use gs -q -dNODISPLAY -c "(../escaped\ file \name.pdf) (r) file runpdfbegin pdfpagecount = quit"
  • tstone-1
    tstone-1 over 4 years
    Very helpful. Does work with GS 9.22 but is somehow incompatible to (at least) 9.50 and 9.52. Somebody knows how to fix this?
  • mmj
    mmj over 4 years
    @user18258 I don't know how to fix this but anyway I found more convenient to use another command line tool to split PDF files on Windows, sedja console. Here is a drag-and-drop batch: codepile.net/pile/6lWv3wzY
  • tstone-1
    tstone-1 almost 4 years
    @mmj Thanks for the code based on sedja! I'm using GhostScript for a lot of 'shell:sendto' tasks and would still be interested in a 9.52 compatible solution - although I understand that you won't provide it. I found a small bug in your GS-based code above (which I'm still using with GS version 9.27!): I think that gswin64c.exe ... "%1" should be gswin64c.exe ... %1, or else there will be trouble when the path contains spaces.
  • Otto G
    Otto G almost 4 years
    Thank you for the pointer to the -dFirstPage=… and -dLastPage=… parameters!
  • mmj
    mmj over 3 years
    @tstone-1 It seems that for Ghostscript 9.50+ you have to add the -dNOSAFER option (together with -dNODISPLAY). See: stackoverflow.com/q/40156190