Ghostscript PDF/A conversion fails validation

5,749

Solution 1

With the help of a GhostScript developer in this bug report, I was able to solve the /N problem. Lessons learned:

  • The GhostScript doc referenced in my question is out of date. The current doc, here, says that ProcessColorModel=DeviceRGB is okay.
  • ICC profiles describe a color space. Some valid color spaces are GRAY, RGB, and CMYK. You can check the color space of an ICC profile using the free ICC Profile Inspector.
  • In the section of the PDF file causing validation errors, /N represents the number of colorants.
  • The PDFA_def.ps file emits the /N value. The sample included with Ghostscript 9.07 only emits /N 1 (for ProcessColorModel=DeviceGray) or /N 4 (for any other ProcessColorModel).
  • My original test specified ProcessColorModel=DeviceCMYK which caused /N 4, but used an ICC profile describing an RGB color space. The validators correctly caught this discrepancy: I promised 4 colors but only described 3.

Most ICC profiles that I found for displays and office printers describe an RGB color space. (CMYK seems more specific to high-end printing presses and certain kinds of paper.) For my purposes, RGB is preferable. The following batch file converts a PDF file to PDF/A-1b with an RGB color space:

gswin32c ^
   -dPDFA ^
   -dNOOUTERSAVE ^
   -sProcessColorModel=DeviceRGB ^
   -dUseCIEColor ^
   -sDEVICE=pdfwrite ^
   -o %2 ^
   -dPDFACompatibilityPolicy=1 ^
    "C:\Program Files (x86)\gs\gs9.07\mylib\PDFA_def.ps" ^
    %1

In PDFA_def.ps, specify an ICC profile that describes an RGB color space, and change the section for defining an ICC profile as follows:

% Define an ICC profile :

[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA} <</N systemdict /ProcessColorModel get /DeviceGray eq {1} {systemdict /ProcessColorModel get /DeviceRGB eq {3} {4} ifelse} ifelse >> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark

The long line includes a nested ifelse statement that will detect ProcessColorModel=DeviceRGB and emit the appropriate /N 3. The resulting file should pass validation at pdf-tools.com.

Update: I've created a somewhat more capable batch program and published it in a blog post: Batch Convert PDF to PDF/A.

Solution 2

I would suggest to first re-test your problem on the latest version 9.07 of ghostscript, just in case this problem was already fixed.

If this doesn't help, it will take a real PDF guru to answer this problem. I suspect the problem has something to do with a conflict between the content of the .ps file and the parameters of the gswin32c command.

However, as the problematic file is generated by ghostscript, you have the right to post your question on the ghostscript Bugzilla page (registration required), where the developers will answer your question. If it is a bug in ghostscript, it will most probably be fixed in the next version.

In addition to the problem description as in your post, you should attach an example input .ps file and the resulting .pdf file. Try to minimize their sizes.

In the past I have reported several suspected ghostscript bugs on that forum and was always well-answered, and the real bugs I have found were all fixed.

Share:
5,749

Related videos on Youtube

Mark Berry
Author by

Mark Berry

Updated on September 18, 2022

Comments

  • Mark Berry
    Mark Berry almost 2 years

    I'm developing a "paperless" workflow and plan to save all files in PDF/A-1b format.

    I'm trying to develop a simple batch file for converting PDF files that I create or receive to PDF/A-1b. Starting from this answer, I have the following batch file:

    gswin32c ^
       -dPDFA ^
       -dNOOUTERSAVE ^
       -sProcessColorModel=DeviceCMYK ^
       -dUseCIEColor ^
       -sDEVICE=pdfwrite ^
       -o %2 ^
       -dPDFACompatibilityPolicy=1 ^
        "C:\Program Files (x86)\gs\gs9.07\mylib\PDFA_def.ps" ^
        %1
    

    In PDFA_def.ps, I've tried a few different ICC profiles, including one I found on my system

    C:/Windows/System32/spool/drivers/color/CalibratedDisplayProfile-5.icc
    

    and sRGB_IEC61966-2-1_no_black_scaling.icc from color.org.

    My test input file is a 1-page email printed from Microsoft Outlook 2010 using CutePDF 2.8 (which uses Ghostscript 8.15).

    After converting with my batch file and Ghostscript 9.07, Adobe Reader thinks the output is PDF/A, but PDF/A-1b validation by pdf-tools.com fails with the message "The value of the key N is 4 but must be 3."

    I have traced this back to the following construct in the PDF output file:

    <</Filter/FlateDecode
    /N 4/Length 2595>>stream
    

    If I change /N 4 to /N 3, the "value of key N" message goes away. /N apparently represents the number of objects in the stream that follows this header. I don't know how to read the encoded stream so I don't understand what it contains nor why pdf-tools thinks it must only contain 3 objects.

    A PDF/A printed using Bullzip, which also uses Ghostscript, also fails validation with the "key N is 4 but must be 3" message.

    Does this have something to do with the color space? I'm out of my depth there. I think I'd be happy with a "plain" sRGB space. Ghostscipt docs say the PDF/A encoding must be CMYK. Adobe implies that either RGB or CMYK works for PDF/A. So I'm unclear about how to find an appropriate .icc profile.

    Or maybe the validator is wrong and everything is fine?

  • Mark Berry
    Mark Berry about 11 years
    Thanks for the reply. I may have confused things by mentioning that the the original was created by CutePDF 2.8, which uses GhostScript 8.15. However, the actual conversion that I'm trying to debug was done with GhostScript 9.07. I'm kind of surprised that PDF/A conversion hasn't been solved and verified already, but I haven't found a thorough tutorial. I may try the ghostscript forum.
  • harrymc
    harrymc about 11 years
    Just remember that this is not a user-oriented forum but a bug report website, answered directly and only by the developers.
  • Mark Berry
    Mark Berry about 11 years
    I decided, before submitting a bug, to download a trial of Adobe Acrobat XI. There, my test documents pass "pre-flight" PDF/A-1b validation. Does Ghostscript (and do others) tend to lean toward the Adobe interpretation of the standard? If so, there's probably no point in pursuing this as a bug.
  • harrymc
    harrymc about 11 years
    According to what you say, products based on Ghostscript find this /N to be non-conforming. Acrobat's not detecting it can simply be a bug in the Acrobat checker. In my own reading of the PDF specs, I have not found /N in the list of legal parameters for FlateDecode, so I have no idea why it is being emitted at all (you can try to delete it and see if the pdf still works). You don't have to worry about presenting the Ghostscript developers with a false bug - I have already done so more than once and was always treated with courtesy.
  • harrymc
    harrymc about 11 years
    I have searched the PDF reference for all occurrences of "N" and haven't one that seems to apply to your case. So what's left is to wait for the official Ghostscript answer. Please post it here.
  • Mark Berry
    Mark Berry about 11 years
    I see several instances of "/N 3" and "/N 4" in the PDF 1.7 reference. It apparently refers to the number of objects in a stream. Several of the examples refer to ICC color profiles. Since CMYK would include 4 colors and RGB only 3, my suspicion is some confusion between the two.
  • Mark Berry
    Mark Berry about 11 years
    If I delete /N, the PDF opens in Adobe Reader but just displays a blank page.
  • harrymc
    harrymc about 11 years
    So it is necessary, probably because the stream describes an embedded image. I have seen that in the bug you opened the Ghostscript developer answered that /N=4 signifies a CMYK color profile, while /N=3 specifies RGB, and that this is in conformity with the specs. He is most probably right in saying that not accepting it is a bug. This is not my first time to discover that the Ghostscript team knows the specs better than other implementers.
  • harrymc
    harrymc about 11 years
    ... In that case, converting your images from CMYK to RGB will maybe fix the problem.
  • Mark Berry
    Mark Berry about 11 years
    My simplest test file contains no images, but by trial and error, I learned that the ICC color space must still correspond to the value of /N. The PDFA_def.ps sample included with GS 9.07 does not handle DeviceRGB correctly, i.e. it never emits /N 3. I've posted an answer with a suggested enhancement to PDFA_def.ps.
  • Mark Berry
    Mark Berry about 11 years
    Harry, I appreciate your being a discussion partner on this question and pointing me to the Ghostscript bug site. I've marked your answer as helpful and created my own answer with the technical specifics I eventually worked out. I'm a bit confused about the bounty system. I was going to ask what you considered fair in this situation, but after reading the FAQ, it seems you may automatically get half the points (which is fine) and I'll lose the other half? The FAQ doesn't seem to cover the situation where the bounty drew an answer that was helpful but the accepted answered is by the asker.
  • harrymc
    harrymc about 11 years
    The half-reputation rule applies only when there is no solution, so you would have lost the entire bounty.
  • Mark Berry
    Mark Berry about 11 years