How to crop a multi-page (image/scanned) pdf file (which won't crop with pdfcrop)?

13,839

Solution 1

Full credit is due to AlexG who incidentally en passant posted a solution to this problem here, which, for completeness sake and so it doesn't get lost (!), I quote below.

Relevant to the above question is the trimming option described in the man:

Usage examples:

#default operation
pdfcrop.sh orig.pdf cropped.pdf
pdfcrop.sh -m 10 orig.pdf cropped.pdf
pdfcrop.sh -hires orig.pdf cropped.pdf

#trimming pages
pdfcrop.sh -t "10 20 30 40" orig.pdf trimmed.pdf

Content of pdfcrop.sh:

#!/bin/bash

function usage () {
  echo "Usage: `basename $0` [Options] <input.pdf> [<output.pdf>]"
  echo
  echo " * Removes white margins from each page in the file. (Default operation)"
  echo " * Trims page edges by given amounts. (Alternative operation)"
  echo
  echo "If only <input.pdf> is given, it is overwritten with the cropped output."
  echo
  echo "Options:"
  echo
  echo " -m \"<left> [<top> [<right> <bottom>]]\""
  echo "    adds extra margins in default operation mode. Unit is bp. A single number"
  echo "    is used for all margins, two numbers \"<left> <top>\" are applied to the"
  echo "    right and bottom margins alike."
  echo
  echo " -t \"<left> [<top> [<right> <bottom>]]\""
  echo "    trims outer page edges by the given amounts. Unit is bp. A single number"
  echo "    is used for all trims, two numbers \"<left> <top>\" are applied to the"
  echo "    right and bottom trims alike."
  echo
  echo " -hires"
  echo "    %%HiResBoundingBox is used in default operation mode."
  echo
  echo " -help"
  echo "    prints this message."
}

c=0
mar=(0 0 0 0); tri=(0 0 0 0)
bbtype=BoundingBox

while getopts m:t:h: opt
do
  case $opt
  in
    m)
    eval mar=($OPTARG)
    [[ -z "${mar[1]}" ]] && mar[1]=${mar[0]}
    [[ -z "${mar[2]}" || -z "${mar[3]}" ]] && mar[2]=${mar[0]} && mar[3]=${mar[1]}
    c=0
    ;;
    t)
    eval tri=($OPTARG)
    [[ -z "${tri[1]}" ]] && tri[1]=${tri[0]}
    [[ -z "${tri[2]}" || -z "${tri[3]}" ]] && tri[2]=${tri[0]} && tri[3]=${tri[1]}
    c=1
    ;;
    h)
    if [[ "$OPTARG" == "ires" ]]
    then
      bbtype=HiResBoundingBox
    else
      usage 1>&2; exit 0
    fi
    ;;
    \?)
    usage 1>&2; exit 1
    ;;
  esac
done
shift $((OPTIND-1))

[[ -z "$1" ]] && echo "`basename $0`: missing filename" 1>&2 && usage 1>&2 && exit 1
input=$1;output=$1;shift;
[[ -n "$1" ]] && output=$1 && shift;

(
    [[ "$c" -eq 0 ]] && gs -dNOPAUSE -q -dBATCH -sDEVICE=bbox "$input" 2>&1 | grep "%%$bbtype"
    pdftk "$input" output - uncompress
) | perl -w -n -s -e '
  BEGIN {@m=split /\s+/, $mar; @t=split /\s+/, $tri;}
  if (/BoundingBox:\s+([\d\.\s]+\d)/) { push @bbox, $1; next;}
  elsif (/\/MediaBox\s+\[([\d\.\s]+\d)\]/) { @mb=split /\s+/, $1; next; }
  elsif (/pdftk_PageNum\s+(\d+)/) {
    $p=$1-1;
    if($c){
      $mb[0]+=$t[0];$mb[1]+=$t[1];$mb[2]-=$t[2];$mb[3]-=$t[3];
      print "/MediaBox [", join(" ", @mb), "]\n";
    } else {
      @bb=split /\s+/, $bbox[$p];
      $bb[0]+=$mb[0];$bb[1]+=$mb[1];$bb[2]+=$mb[0];$bb[3]+=$mb[1];
      $bb[0]-=$m[0];$bb[1]-=$m[1];$bb[2]+=$m[2];$bb[3]+=$m[3];
      print "/MediaBox [", join(" ", @bb), "]\n";
    }
  }
  print;
' -- -mar="${mar[*]}" -tri="${tri[*]}" -c=$c | pdftk - output "$output" compress

Solution 2

You could try briss. It's pretty simple, but does the job. It's a GUI app though.

Download the zip file and extract to a folder of your choice and start it:

java -jar briss-0.9.jar

To install it permanently and system-wide and be able to start it from anywhere with just briss, you would unpack the download in /usr/local/lib/, then create an executable file /usr/local/bin/briss that contains:

#!/bin/sh
java -jar /usr/local/lib/briss-0.9/briss-0.9.jar

Solution 3

This here is the best and easiest and has a wonderful GUI: Krop

Download deb from the author: http://arminstraub.com/computer/krop

Review: http://www.hecticgeek.com/2013/08/crop-pdf-ubuntu-13-04-krop/

Edit: I am using krop since 13.10 and I noticed that the latest versions started to support opening a pdf with krop via right click. I also switched to the snap version since it became available and it supports also right click, confirmed on 18.10 - 20.04. The GUI is not as colorful with the snap version but functionality is the same:

sudo snap install krop
Share:
13,839

Related videos on Youtube

nutty about natty
Author by

nutty about natty

Updated on September 18, 2022

Comments

  • nutty about natty
    nutty about natty over 1 year

    Usually, I'm pretty happy using pdfcrop, even though the cropped output usually consumes significantly more disk space. Note that comparable code does exist, which addresses and resolves this issue. However, if wanting to crop a scanned (image) pdf file, my impression is that pdfcrop simply fails. I imagine that ImageMagick is capable of doing the trick, possibly by (also) making us of pdftk.

    I'm looking for an efficient one-liner of code (a multi-line script would also be ok...) to crop such a pdf file from Top-Bottom-Left-and-Right by x cm each (or, better yet, by a b c d cm, individually), going all the way from input.pdf to output.pdf.

    ps: the solution needn't involve ImageMagick; I'm happy as long as it works (cleanly, reliably and efficiently)... ;)

  • MrMartin
    MrMartin over 7 years
    This method fails on certain files, see this bug
  • MrMartin
    MrMartin over 7 years
    When it fails, this can be resolved by first printing the pdf to file, using a document viewer like Evince
  • ryanjdillon
    ryanjdillon over 7 years
    I really like this one. GUI is nice for lots of irregular crops. Allows cropping different selections from the same page into a multi-page pdf. Great! Thanks!