How to convert a color pdf to black-white?
Solution 1
The gs example
The gs
command you're running above has a trailing $1
which is typically meant for passing command line arguments into a script. So I'm not sure what you actually tried but I'm guessing that you tried to put that command into a script, script.sh
:
#!/bin/bash
gs -sOutputFile=output.pdf \
-q -dNOPAUSE -dBATCH -dSAFER \
-sDEVICE=pdfwrite \
-dCompatibilityLevel=1.3 \
-dPDFSETTINGS=/screen \
-dEmbedAllFonts=true \
-dSubsetFonts=true \
-sColorConversionStrategy=/Mono \
-sColorConversionStrategyForImages=/Mono \
-sProcessColorModel=/DeviceGray \
$1
And run it like this:
$ ./script.sh: 19: ./script.sh: output.pdf: not found
Not sure how you setup this script but it needs to executable.
$ chmod +x script.sh
Something definitely doesn't seem right with that script though. When I tried it I got this error instead:
Unrecoverable error: rangecheck in .putdeviceprops
An alternative
Instead of that script I'd use this one from the SU question instead.
#!/bin/bash
gs \
-sOutputFile=output.pdf \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray \
-dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 \
-dNOPAUSE \
-dBATCH \
$1
Then run it like this:
$ ./script.bash LeaseContract.pdf
GPL Ghostscript 8.71 (2010-02-10)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 2.
Page 1
Page 2
Solution 2
I found a script here that can do this. It requires gs
which you seem to have but also pdftk
. You have not mentioned your distribution but on Debian-based systems, you should be able to install it with
sudo apt-get install pdftk
You can find RPMs for it here.
Once you have installed pdftk
, save the script as graypdf.sh
and run like so:
./greypdf.sh input.pdf
It will create a file called input-gray.pdf
. I am including the whole script here to avoid link rot:
# convert pdf to grayscale, preserving metadata
# "AFAIK graphicx has no feature for manipulating colorspaces. " http://groups.google.com/group/latexusersgroup/browse_thread/thread/5ebbc3ff9978af05
# "> Is there an easy (or just standard) way with pdflatex to do a > conversion from color to grayscale when a PDF file is generated? No." ... "If you want to convert a multipage document then you better have pdftops from the xpdf suite installed because Ghostscript's pdf to ps doesn't produce nice Postscript." http://osdir.com/ml/tex.pdftex/2008-05/msg00006.html
# "Converting a color EPS to grayscale" - http://en.wikibooks.org/wiki/LaTeX/Importing_Graphics
# "\usepackage[monochrome]{color} .. I don't know of a neat automatic conversion to monochrome (there might be such a thing) although there was something in Tugboat a while back about mapping colors on the fly. I would probably make monochrome versions of the pictures, and name them consistently. Then conditionally load each one" http://newsgroups.derkeiler.com/Archive/Comp/comp.text.tex/2005-08/msg01864.html
# "Here comes optional.sty. By adding \usepackage{optional} ... \opt{color}{\includegraphics[width=0.4\textwidth]{intro/benzoCompounds_color}} \opt{grayscale}{\includegraphics[width=0.4\textwidth]{intro/benzoCompounds}} " - http://chem-bla-ics.blogspot.com/2008/01/my-phd-thesis-in-color-and-grayscale.html
# with gs:
# http://handyfloss.net/2008.09/making-a-pdf-grayscale-with-ghostscript/
# note - this strips metadata! so:
# http://etutorials.org/Linux+systems/pdf+hacks/Chapter+5.+Manipulating+PDF+Files/Hack+64+Get+and+Set+PDF+Metadata/
COLORFILENAME=$1
OVERWRITE=$2
FNAME=${COLORFILENAME%.pdf}
# NOTE: pdftk does not work with logical page numbers / pagination;
# gs kills it as well;
# so check for existence of 'pdfmarks' file in calling dir;
# if there, use it to correct gs logical pagination
# for example, see
# http://askubuntu.com/questions/32048/renumber-pages-of-a-pdf/65894#65894
PDFMARKS=
if [ -e pdfmarks ] ; then
PDFMARKS="pdfmarks"
echo "$PDFMARKS exists, using..."
# convert to gray pdf - this strips metadata!
gs -sOutputFile=$FNAME-gs-gray.pdf -sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH "$COLORFILENAME" "$PDFMARKS"
else # not really needed ?!
gs -sOutputFile=$FNAME-gs-gray.pdf -sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH "$COLORFILENAME"
fi
# dump metadata from original color pdf
## pdftk $COLORFILENAME dump_data output $FNAME.data.txt
# also: pdfinfo -meta $COLORFILENAME
# grep to avoid BookmarkTitle/Level/PageNumber:
pdftk $COLORFILENAME dump_data output | grep 'Info\|Pdf' > $FNAME.data.txt
# "pdftk can take a plain-text file of these same key/value pairs and update a PDF's Info dictionary to match. Currently, it does not update the PDF's XMP stream."
pdftk $FNAME-gs-gray.pdf update_info $FNAME.data.txt output $FNAME-gray.pdf
# (http://wiki.creativecommons.org/XMP_Implementations : Exempi ... allows reading/writing XMP metadata for various file formats, including PDF ... )
# clean up
rm $FNAME-gs-gray.pdf
rm $FNAME.data.txt
if [ "$OVERWRITE" == "y" ] ; then
echo "Overwriting $COLORFILENAME..."
mv $FNAME-gray.pdf $COLORFILENAME
fi
# BUT NOTE:
# Mixing TEX & PostScript : The GEX Model - http://www.tug.org/TUGboat/Articles/tb21-3/tb68kost.pdf
# VTEX is a (commercial) extended version of TEX, sold by MicroPress, Inc. Free versions of VTEX have recently been made available, that work under OS/2 and Linux. This paper describes GEX, a fast fully-integrated PostScript interpreter which functions as part of the VTEX code-generator. Unless specified otherwise, this article describes the functionality in the free- ware version of the VTEX compiler, as available on CTAN sites in systems/vtex.
# GEX is a graphics counterpart to TEX. .. Since GEX may exercise subtle influence on TEX (load fonts, or change TEX registers), GEX is op- tional in VTEX implementations: the default oper- ation of the program is with GEX off; it is enabled by a command-line switch.
# \includegraphics[width=1.3in, colorspace=grayscale 256]{macaw.jpg}
# http://mail.tug.org/texlive/Contents/live/texmf-dist/doc/generic/FAQ-en/html/FAQ-TeXsystems.html
# A free version of the commercial VTeX extended TeX system is available for use under Linux, which among other things specialises in direct production of PDF from (La)TeX input. Sadly, it���s no longer supported, and the ready-built images are made for use with a rather ancient Linux kernel.
# NOTE: another way to capture metadata; if converting via ghostscript:
# http://compgroups.net/comp.text.pdf/How-to-specify-metadata-using-Ghostscript
# first:
# grep -a 'Keywo' orig.pdf
# /Author(xxx)/Title(ttt)/Subject()/Creator(LaTeX)/Producer(pdfTeX-1.40.12)/Keywords(kkkk)
# then - copy this data in a file prologue.ini:
#/pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse
#[/Author(xxx)
#/Title(ttt)
#/Subject()
#/Creator(LaTeX with hyperref package + gs w/ prologue)
#/Producer(pdfTeX-1.40.12)
#/Keywords(kkkk)
#/DOCINFO pdfmark
#
# finally, call gs on the orig file,
# asking to process pdfmarks in prologue.ini:
# gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
# -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -dDOPDFMARKS \
# -sOutputFile=out.pdf in.pdf prologue.ini
# then the metadata will be in output too (which is stripped otherwise;
# note bookmarks are preserved, however).
Solution 3
I also had some scanned color pdfs and grayscale pdfs that I wanted to convert to bw. I tried using gs
with the code listed here, and image quality is good with pdf text still there. However, that gs code only converts to grayscale (as asked in the question) and still has large file size. convert
yields very poor results when used directly.
I wanted bw pdfs with good image quality and small file size. I would have tried terdon's solution, but I could not get pdftk
on centOS 7 using yum (at time of writing).
My solution uses gs
to extract grayscale bmp files from the pdf, convert
to threshold those bmps to bw and save them as tiff files, and then img2pdf to compress the tiff images and merge them all into one pdf.
I tried going directly to tiff from the pdf but the quality is not the same so I save each page to bmp. For a one page pdf file, convert
does a great job from bmp to pdf. Example:
gs -sDEVICE=bmpgray -dNOPAUSE -dBATCH -r300x300 \
-sOutputFile=./pdf_image.bmp ./input.pdf
convert ./pdf_image.bmp -threshold 40% -compress zip ./bw_out.pdf
For multiple pages, gs
can merge multiple pdf files into one, but img2pdf
yields smaller file size than gs. The tiff files must be uncompressed as input to img2pdf. Keep in mind for large numbers of pages, the intermediate bmp and tiff files tend to be large in size. pdftk
or joinpdf
would be better if they can merge compressed pdf files from convert
.
I imagine there is a more elegant solution. However, my method produces results with very good image quality and much smaller file size. To get text back in the bw pdf, run OCR again.
My shell script uses gs, convert, and img2pdf. Change the parameters (# of pages, scan dpi, threshold %, etc) listed in the beginning as needed, and run chmod +x ./pdf2bw.sh
. Here is the full script (pdf2bw.sh):
#!/bin/bash
num_pages=12
dpi_res=300
input_pdf_name=color_or_grayscale.pdf
bw_threshold=40%
output_pdf_name=out_bw.pdf
#-------------------------------------------------------------------------
gs -sDEVICE=bmpgray -dNOPAUSE -dBATCH -q -r$dpi_res \
-sOutputFile=./%d.bmp ./$input_pdf_name
#-------------------------------------------------------------------------
for file_num in `seq 1 $num_pages`
do
convert ./$file_num.bmp -threshold $bw_threshold \
./$file_num.tif
done
#-------------------------------------------------------------------------
input_files=""
for file_num in `seq 1 $num_pages`
do
input_files+="./$file_num.tif "
done
img2pdf -o ./$output_pdf_name --dpi $dpi_res $input_files
#-------------------------------------------------------------------------
# clean up bmp and tif files used in conversion
for file_num in `seq 1 $num_pages`
do
rm ./$file_num.bmp
rm ./$file_num.tif
done
Solution 4
I get reliable results cleaning up scanned pdf's to good contrast with this script;
#!/bin/bash
#
# $ sudo apt install poppler-utils img2pdf pdftk imagemagick
#
# Output is still greyscale, but lots of scanner light tone fuzz removed.
#
pdfimages $1 pages
ls ./pages*.ppm | xargs -L1 -I {} convert {} -quality 100 -density 400 \
-fill white -fuzz 80% -auto-level -depth 4 +opaque "#000000" {}.jpg
ls -1 ./pages*jpg | xargs -L1 -I {} img2pdf {} -o {}.pdf
pdftk pages*.pdf cat output ${1/.pdf/}_bw.pdf
rm pages*
Solution 5
RHEL6 and RHEL5, which both baseline Ghostscript on 8.70, couldn't use the forms of the command given above. Assuming a script or a function expecting the PDF file as the first argument "$1", the following should be more portable:
gs \
-sOutputFile="grey_$1" \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=Mono \
-sColorConversionStrategyForImages=/Mono \
-dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.3 \
-dNOPAUSE -dBATCH \
"$1"
Where the output file will be prefixed with "grey_".
RHEL6 and 5 can use CompatibilityLevel=1.4 which is much quicker, but I was aiming for portability.
Related videos on Youtube
BowPark
Updated on September 18, 2022Comments
-
BowPark over 1 year
I'd like to transform a pdf with some coloured text and images in another pdf with only black&white, in order to reduce its dimensions. Moreover, I would like to keep the text as text, without transforming the pages elements in pictures. I tried the following command:
convert -density 150 -threshold 50% input.pdf output.pdf
found in another question, a link, but it does what I don't want: the text in the output is transformed in a poor image and is no longer selectable. I tried with Ghostscript:
gs -sOutputFile=output.pdf \ -q -dNOPAUSE -dBATCH -dSAFER \ -sDEVICE=pdfwrite \ -dCompatibilityLevel=1.3 \ -dPDFSETTINGS=/screen \ -dEmbedAllFonts=true \ -dSubsetFonts=true \ -sColorConversionStrategy=/Mono \ -sColorConversionStrategyForImages=/Mono \ -sProcessColorModel=/DeviceGray \ $1
but it gives me the following error message:
./script.sh: 19: ./script.sh: output.pdf: not found
Is there any other way to create the file?
-
Admin over 10 yearsThis looks so good superuser.com/questions/200378/…
-
Admin over 10 years
-
Admin over 10 yearsIs that the entire script you ran? It doesn't look like it, could you post the whole script?
-
-
Sora. over 6 yearsYou're right, there's something wrong with the script: "something" in this case would be
sProcessColorModel
which should bedProcessColorModel
instead. -
Igor about 5 years
-
Rich about 5 yearsThanks, @Igor -- I have no idea where I got that snippet from! I know for a fact that I tested it and it worked at the time. (And that, folks, is why you should always provide references for your code.)
-
Igor about 5 yearsThat "fake parameter" seem to be an incredibly popular among the web. GS ignores unknown switches (which is sad), so it works anyway.
-
Admin almost 2 yearsAutomatic replacement suggestion for output file name : input.pdf => input_bw.pdf -sOutputFile=${1%%.*}_bw.pdf