How to convert a PDF to grayscale from command line avoiding to be rasterized?

29,761

Solution 1

gs \
   -sDEVICE=pdfwrite \
   -sProcessColorModel=DeviceGray \
   -sColorConversionStrategy=Gray \
   -dOverrideICC \
   -o out.pdf \
   -f page-27.pdf

This command converts your file to grayscale (GS 9.10).

Solution 2

A bit late in the day, but the top answer doesn't work for me with a different file. The underlying problem appears to be old code in Ghostscript, for which there is a later version that is not enabled by default. More on that here: http://bugs.ghostscript.com/show_bug.cgi?id=694608

The page above also gives a command that works for me:

gs \
  -sDEVICE=pdfwrite \
  -dProcessColorModel=/DeviceGray \
  -dColorConversionStrategy=/Gray \
  -dPDFUseOldCMS=false \
  -o out.pdf \
  -f in.pdf

Solution 3

Use the most recent code (not yet released) and set ColorConversionStrategy=Gray

Solution 4

In Linux:

Install pdftk

apt-get install pdftk

Once you have installed pdftk, save the script as graypdf.sh with the following code

# convert pdf to grayscale, preserving metadata
# "AFAIK graphicx has no feature for manipulating colorspaces. " http://groups.google.com/group/latexusersgroup/browse_thread/thread/5ebbc3ff9978af05
# "> Is there an easy (or just standard) way with pdflatex to do a > conversion from color to grayscale when a PDF file is generated? No." ... "If you want to convert a multipage document then you better have pdftops from the xpdf suite installed because Ghostscript's pdf to ps doesn't produce nice Postscript." http://osdir.com/ml/tex.pdftex/2008-05/msg00006.html
# "Converting a color EPS to grayscale" - http://en.wikibooks.org/wiki/LaTeX/Importing_Graphics
# "\usepackage[monochrome]{color} .. I don't know of a neat automatic conversion to monochrome (there might be such a thing) although there was something in Tugboat a while back about mapping colors on the fly. I would probably make monochrome versions of the pictures, and name them consistently. Then conditionally load each one" http://newsgroups.derkeiler.com/Archive/Comp/comp.text.tex/2005-08/msg01864.html
# "Here comes optional.sty. By adding \usepackage{optional} ... \opt{color}{\includegraphics[width=0.4\textwidth]{intro/benzoCompounds_color}} \opt{grayscale}{\includegraphics[width=0.4\textwidth]{intro/benzoCompounds}} " - http://chem-bla-ics.blogspot.com/2008/01/my-phd-thesis-in-color-and-grayscale.html
# with gs:
# http://handyfloss.net/2008.09/making-a-pdf-grayscale-with-ghostscript/
# note - this strips metadata! so:
# http://etutorials.org/Linux+systems/pdf+hacks/Chapter+5.+Manipulating+PDF+Files/Hack+64+Get+and+Set+PDF+Metadata/
COLORFILENAME=$1
OVERWRITE=$2
FNAME=${COLORFILENAME%.pdf}
# NOTE: pdftk does not work with logical page numbers / pagination;
# gs kills it as well;
# so check for existence of 'pdfmarks' file in calling dir;
# if there, use it to correct gs logical pagination
# for example, see
# http://askubuntu.com/questions/32048/renumber-pages-of-a-pdf/65894#65894
PDFMARKS=
if [ -e pdfmarks ] ; then
PDFMARKS="pdfmarks"
echo "$PDFMARKS exists, using..."
# convert to gray pdf - this strips metadata!
gs -sOutputFile=$FNAME-gs-gray.pdf -sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH "$COLORFILENAME" "$PDFMARKS"
else # not really needed ?!
gs -sOutputFile=$FNAME-gs-gray.pdf -sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray -dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH "$COLORFILENAME"
fi
# dump metadata from original color pdf
## pdftk $COLORFILENAME dump_data output $FNAME.data.txt
# also: pdfinfo -meta $COLORFILENAME
# grep to avoid BookmarkTitle/Level/PageNumber:
pdftk $COLORFILENAME dump_data output | grep 'Info\|Pdf' > $FNAME.data.txt
# "pdftk can take a plain-text file of these same key/value pairs and update a PDF's Info dictionary to match. Currently, it does not update the PDF's XMP stream."
pdftk $FNAME-gs-gray.pdf update_info $FNAME.data.txt output $FNAME-gray.pdf
# (http://wiki.creativecommons.org/XMP_Implementations : Exempi ... allows reading/writing XMP metadata for various file formats, including PDF ... )
# clean up
rm $FNAME-gs-gray.pdf
rm $FNAME.data.txt
if [ "$OVERWRITE" == "y" ] ; then
echo "Overwriting $COLORFILENAME..."
mv $FNAME-gray.pdf $COLORFILENAME
fi
# BUT NOTE:
# Mixing TEX & PostScript : The GEX Model - http://www.tug.org/TUGboat/Articles/tb21-3/tb68kost.pdf
# VTEX is a (commercial) extended version of TEX, sold by MicroPress, Inc. Free versions of VTEX have recently been made available, that work under OS/2 and Linux. This paper describes GEX, a fast fully-integrated PostScript interpreter which functions as part of the VTEX code-generator. Unless specified otherwise, this article describes the functionality in the free- ware version of the VTEX compiler, as available on CTAN sites in systems/vtex.
# GEX is a graphics counterpart to TEX. .. Since GEX may exercise subtle influence on TEX (load fonts, or change TEX registers), GEX is op- tional in VTEX implementations: the default oper- ation of the program is with GEX off; it is enabled by a command-line switch.
# \includegraphics[width=1.3in, colorspace=grayscale 256]{macaw.jpg}
# http://mail.tug.org/texlive/Contents/live/texmf-dist/doc/generic/FAQ-en/html/FAQ-TeXsystems.html
# A free version of the commercial VTeX extended TeX system is available for use under Linux, which among other things specialises in direct production of PDF from (La)TeX input. Sadly, it���s no longer supported, and the ready-built images are made for use with a rather ancient Linux kernel.
# NOTE: another way to capture metadata; if converting via ghostscript:
# http://compgroups.net/comp.text.pdf/How-to-specify-metadata-using-Ghostscript
# first:
# grep -a 'Keywo' orig.pdf
# /Author(xxx)/Title(ttt)/Subject()/Creator(LaTeX)/Producer(pdfTeX-1.40.12)/Keywords(kkkk)
# then - copy this data in a file prologue.ini:
#/pdfmark where {pop} {userdict /pdfmark /cleartomark load put} ifelse
#[/Author(xxx)
#/Title(ttt)
#/Subject()
#/Creator(LaTeX with hyperref package + gs w/ prologue)
#/Producer(pdfTeX-1.40.12)
#/Keywords(kkkk)
#/DOCINFO pdfmark
#
# finally, call gs on the orig file,
# asking to process pdfmarks in prologue.ini:
# gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
# -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -dDOPDFMARKS \
# -sOutputFile=out.pdf in.pdf prologue.ini
# then the metadata will be in output too (which is stripped otherwise;
# note bookmarks are preserved, however). 

give the file excecution permissions

chmod +x greypdf.sh

And execute it like this:

./greypdf.sh input.pdf

It will create a file input-gray.pdf in the same location than the initial file

Solution 5

If you crack into the file, you'll find that most of the colors are determined through an RGB ICC based color space (look for 8 0 R to find all the references to this colorspace). Perhaps gs is complaining about that?

Who knows.

The take away is that converting a page from one colorspace to another without affecting the content is non-trivial in that you need to be able to render the page and trap all changes to the current color/colorspace and substitute an equivalent in the target space as well as convert all image XObjects in the wrong colorspace, which will require decoding the image data and re-encoding it in the target space, as well as all form XObjects, which will be a task similar to trying to convert the parent page since form XObjects (I think your doc has 4) also contain resources and a content stream of page marking operators (which may include more XObjects).

It's certainly doable, but the process is nearly the same as rendering but with some fairly special-purpose code.

Share:
29,761

Related videos on Youtube

Panda
Author by

Panda

Updated on July 09, 2022

Comments

  • Panda
    Panda almost 2 years

    I'm trying to convert to grayscale this PDF: https://dl.dropboxusercontent.com/u/10351891/page-27.pdf

    Ghostscript (v 9.10) with pdfwrite Device fails with a "Unable to convert color space to Gray, reverting strategy to LeaveColorUnchanged." message.

    I'm able to convert it through an intermediary ps file (using gs, pdftops (v 0.24.3) or pdf2ps) but this convertion rasterize the whole PDF. I tryed a lot of other things: normalize the PDF using qpdf (v 5.0.1) or pdftk (v 1.44), transform it to a svg file and back to a PDF via Inkscape (v 0.48.4)... nothing seems to work.

    The only one solution I found (which is not suitable for me in production environment) is to use Preview on my Mac and apply a Quartz Gray Tone filter manually or with an Automator script.

    Anyone find another working way to do it? Or is it possible to normalize the PDF or fix the issue to prevent the Ghostscript message "Unable to convert color space..." or to force the color space in another way?

    Thanks!

  • Gaurav
    Gaurav over 9 years
    It gives this output and the PDF is still colored. GPL Ghostscript 9.10: Unable to convert color space to Gray, reverting strategy to LeaveColorUnchanged.
  • Matt Bannert
    Matt Bannert about 9 years
    +1 worked out of the box on my OSX. adding $1 and $2 instead of out.pdf and page-27.pdf and turning this into a batch script is also helpful for more flexible every day use.
  • Ian Goodfellow
    Ian Goodfellow about 9 years
    This isn't real grayscale. If I print out the ink converage it is using C, M, and Y to make gray.
  • Ian Goodfellow
    Ian Goodfellow about 9 years
    This appears to not be real grayscale. If I run inkcov on it, it is using C, M, and Y to make gray.
  • brownian
    brownian about 7 years
    It converts vectors to raster images.
  • mkl
    mkl almost 7 years
    Please don't use the same answer for two different questions. Instead answer one question and flag the second one add duplicate.
  • Justin Eldracher
    Justin Eldracher over 6 years
    I had to convert the backslashes to ^ in order for it to work, but then it worked great!
  • Frederick Nord
    Frederick Nord over 3 years
    can you expand on your answer and show how to use it? I've tried gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dColorConversionStrategy=Gray -o out.pdf in.pdf but it complains about the unknown option.
  • KenS
    KenS over 3 years
    Use -s, not -d for ColoronversionStrategy. Or use -dColorConversionStrategy=/Gray. The -d switch only works accepts numbers or (PostScript) names.
  • Robert Seifert
    Robert Seifert over 2 years
    +1 for the comment of @IanGoodfellow - this answer does not answer the question. The result may look gray, but is no grayscale.
  • sekrett
    sekrett over 2 years
    How it is possible to verify if the grayscale is true? I tried to create a new document in Photoshop, mode - grayscale, have painted a black line with brush. But when I color pick edges of the line, I can see all CMYK components are filled. Should only K be filled?