Ghostscript color detection

10,797

Solution 1

A (relatively new) Ghostscript device called inkcov (you need Ghostscript v9.05 or newer) can reliably detect whether a PDF page uses color or not.

It displays the ink coverage for the CMYK inks, separately for each single page (for RGB colors, it does a silent conversion to CMYK color space internally).

To investigate and demonstrate its functions, first generate an example PDF with the help of Ghostscript:

gs                                                                     \
  -o color-or-grayscale-test.pdf                                       \
  -sDEVICE=pdfwrite                                                    \
  -g5950x2105                                                          \
  -c "/F1 {10 80 moveto /Helvetica findfont 64 scalefont setfont} def" \
  -c "F1                         (100% 'pure' black)    show showpage" \
  -c "F1 .5 .5 .5   setrgbcolor  ( 50% 'rich' rgbgray)  show showpage" \
  -c "F1 .5 .5 .5 0 setcmykcolor ( 50% 'rich' cmykgray) show showpage" \
  -c "F1 .5         setgray      ( 50% 'pure' gray)     show showpage"

While all the pages do appear to the human eye to not use any color at all, pages 2 and 3 do indeed mix their apparent gray values from color. But none of the colors is directly visible (unless your monitor is grossly mis-adjusted).

Look at the resulting PDF pages (converted to PNG for easier display via the web):

4 PDF pages without directly visible color

In the Prepress industry, 'rich' blacks or shades of gray are frequently used. The idiom 'rich' black or gray is used to express the fact that these shades are not made from purely black toner or ink, but have mixed-in components of color to make them appear more brilliant and more saturated.

Now check each page's ink coverage:

gs  -o - -sDEVICE=inkcov color-or-grayscale-test.pdf
 [...]
 Page 1
  0.00000  0.00000  0.00000  0.05040 CMYK OK
 Page 2
  0.05401  0.05401  0.05401  0.05401 CMYK OK
 Page 3
  0.05799  0.05799  0.05799  0.00000 CMYK OK
 Page 4
  0.00000  0.00000  0.00000  0.04541 CMYK OK

(A value of 1.00000 maps to 100% ink coverage for the respective color channel. So 0.05040 in the first line of the result means 5.04 % of the page area is covered by black ink.) Hence the result given by Ghostscript's inkcov is exactly the expected one:

  • pages 1 + 4 don't use any of C (cyan), M (magenta), Y (yellow) colors, but only K (black).
  • pages 2 + 3 do use ink of C (cyan), M (magenta), Y (yellow) colors, but no K (black) at all.

Now let's convert all pages of the original PDF to use the DeviceGray colorspace:

gs                                \
  -o only-black-ink.pdf           \
  -sDEVICE=pdfwrite               \
  -dColorConversionStrategy=/Gray \
  -dProcessColorModel=/DeviceGray \
   color-or-grayscale-test.pdf

...and check for the ink coverage again:

gs -q  -o - -sDEVICE=inkcov only-black-ink.pdf | grep -v Page
  0.00000  0.00000  0.00000  0.05040 CMYK OK
  0.00000  0.00000  0.00000  0.05401 CMYK OK
  0.00000  0.00000  0.00000  0.05799 CMYK OK
  0.00000  0.00000  0.00000  0.04541 CMYK OK

Again, exactly the expected result in case of succesful color conversions!

Solution 2

A new output device has rendered this answer outdated, see the Accepted Answer.


Until 2011, the file needed to be rasterised to see an output. Since the PDF/PS is a description of what the file looks looks like which gets rendered whilst rasterised. Even if you could do this with ghostscipt I am sure it would need to rasterize/interpret the file first then look at the output, so if you have the png's you might as well do this yourself, which will be less cpu intensive than processing the file again with GS.

Share:
10,797
Matthew Lowe
Author by

Matthew Lowe

code & travel

Updated on June 04, 2022

Comments

  • Matthew Lowe
    Matthew Lowe almost 2 years

    I seem to be unable to find whether Ghostscript is able to simply detect whether job is color or grayscale. I use Ghostscript for conversion of print jobs to PNG, but I also need to have information about color of the job, so I don't have to search it pixel by pixel again.

  • Kurt Pfeifle
    Kurt Pfeifle over 11 years
    ...and I'm pretty sure that your statement "you can't do this" was correct until about a year ago, but it is no longer :-) --So please delete (or edit) your answer, before it gets downvoted... :-)
  • Douglas Anderson
    Douglas Anderson about 11 years
    This is also a quick way if you need to estimate toner coverage for a print file.
  • RedRoosterMobile
    RedRoosterMobile almost 8 years
    In case anybody needs to do this in ruby, here's a gem rubygems.org/gems/pdf_colored_pages that outputs an array containing the page numbers (e.g. 1,3,4) or a range string like '1,3-4', by parsing ghostscript inkcov output.