Convert pdf to pdf/a using iText library

11,540

First this: iText doesn't convert ordinary PDF documents to PDF/A documents. We have customers who use iText to do this, but their code is much more elaborate than yours.

The reason why iText doesn't convert ordinary PDF documents to PDF/A should be evident: an ordinary PDF might not have all the necessary features that are needed in a PDF/A. You might have a PDF of which the fonts aren't embedded. In that case, someone needs to provide the appropriate font program. iText doesn't ship with any font program, hence the software using iText has to provide this.

In your code, you just copy content streams without checking any possible issues that make the end result non-compliant with PDF/A. You should be very careful with the resulting PDFs. They will show the blue bar that the file claims to be PDF/A, but that doesn't mean that the file will validate as a PDF when you pass it through a validator.

Now for your problem. You want to convert an ordinary PDF to PDF/A-1. PDF/A-1 is based on PDF 1.4 dating from 2001. This means that you can't use any of the new features that were introduced after 2001. In PDF 1.4, there was a limitation with respect to object number. Object numbers in PDF couldn't exceed 32,767. This limitation was removed from PDF in PDF 1.5.

My guess is that the problem you describe is caused by your attempt to create a PDF 1.4 with more objects than is allowed in PDF 1.4. There could be two reasons:

  1. Your original PDF is PDF 1.5 or later,
  2. Your manipulations of the PDF require more than the maximum available number of objects.

This could be fixed by generating PDF/A-2 instead of PDF/A-1, but I'm pretty sure that you'll soon hit other limitations (e.g. missing fonts and other issues that are caused by creating a file that claims to be a PDF but that isn't). PdfAWriter will throw exceptions when you try doing things that are blatantly wrong, but there's no guarantee that some more subtle PDF/A requirements are being missed.

Share:
11,540
zhivko
Author by

zhivko

Updated on June 04, 2022

Comments

  • zhivko
    zhivko about 2 years

    I want to export document to PdfAConformanceLevel.PDF_A_1B conformance, but when I do document.close, I get error below, resulting pdf is not usable.

    I use following itext versions:

            <artifactId>itextpdf</artifactId>
            <version>5.5.9</version>
    
            <artifactId>itext-pdfa</artifactId>
            <version>5.5.9</version>
    

    stack trace:

    com.itextpdf.text.pdf.PdfAConformanceException: Real number is out of range.
    at com.itextpdf.text.pdf.internal.PdfA1Checker.checkPdfObject(PdfA1Checker.java:259)
    at com.itextpdf.text.pdf.internal.PdfAChecker.checkPdfAConformance(PdfAChecker.java:208)
    at com.itextpdf.text.pdf.internal.PdfAConformanceImp.checkPdfIsoConformance(PdfAConformanceImp.java:71)
    at com.itextpdf.text.pdf.PdfWriter.checkPdfIsoConformance(PdfWriter.java:3480)
    at com.itextpdf.text.pdf.PdfWriter.checkPdfIsoConformance(PdfWriter.java:3476)
    at com.itextpdf.text.pdf.PdfObject.toPdf(PdfObject.java:174)
    at com.itextpdf.text.pdf.PdfArray.toPdf(PdfArray.java:175)
    at com.itextpdf.text.pdf.PdfDictionary.toPdf(PdfDictionary.java:149)
    at com.itextpdf.text.pdf.PdfStream.superToPdf(PdfStream.java:278)
    at com.itextpdf.text.pdf.PRStream.toPdf(PRStream.java:239)
    at com.itextpdf.text.pdf.PdfIndirectObject.writeTo(PdfIndirectObject.java:158)
    at com.itextpdf.text.pdf.PdfWriter$PdfBody.write(PdfWriter.java:420)
    at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:398)
    at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:377)
    at com.itextpdf.text.pdf.PdfWriter.addToBody(PdfWriter.java:872)
    at com.itextpdf.text.pdf.PdfReaderInstance.writeAllVisited(PdfReaderInstance.java:161)
    at com.itextpdf.text.pdf.PdfReaderInstance.writeAllPages(PdfReaderInstance.java:177)
    at com.itextpdf.text.pdf.PdfWriter.addSharedObjectsToBody(PdfWriter.java:1380)
    at com.itextpdf.text.pdf.PdfWriter.close(PdfWriter.java:1264)
    at com.itextpdf.text.pdf.PdfAWriter.close(PdfAWriter.java:337)
    at com.itextpdf.text.pdf.PdfDocument.close(PdfDocument.java:889)
    at com.itextpdf.text.Document.close(Document.java:416)
    at si.telekom.erender.ERenderImpl.mergeContentOfItems(ERenderImpl.java:2911)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at com.sun.xml.ws.api.server.MethodUtil.invoke(MethodUtil.java:83)
    at com.sun.xml.ws.api.server.InstanceResolver$1.invoke(InstanceResolver.java:250)
    at com.sun.xml.ws.server.InvokerTube$2.invoke(InvokerTube.java:149)
    at com.sun.xml.ws.server.sei.SEIInvokerTube.processRequest(SEIInvokerTube.java:88)
    at com.sun.xml.ws.api.pipe.Fiber.__doRun(Fiber.java:1136)
    at com.sun.xml.ws.api.pipe.Fiber._doRun(Fiber.java:1050)
    at com.sun.xml.ws.api.pipe.Fiber.doRun(Fiber.java:1019)
    at com.sun.xml.ws.api.pipe.Fiber.runSync(Fiber.java:877)
    at com.sun.xml.ws.server.WSEndpointImpl$2.process(WSEndpointImpl.java:419)
    at com.sun.xml.ws.transport.http.HttpAdapter$HttpToolkit.handle(HttpAdapter.java:868)
    at com.sun.xml.ws.transport.http.HttpAdapter.handle(HttpAdapter.java:422)
    at com.sun.xml.ws.transport.http.servlet.ServletAdapter.invokeAsync(ServletAdapter.java:225)
    at com.sun.xml.ws.transport.http.servlet.WSServletDelegate.doGet(WSServletDelegate.java:161)
    at com.sun.xml.ws.transport.http.servlet.WSServletDelegate.doPost(WSServletDelegate.java:197)
    at com.sun.xml.ws.transport.http.servlet.WSServlet.doPost(WSServlet.java:81)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
    at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1041)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:603)
    at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
    

    I am producing PDF with following code:

    public byte[] mergeContentOfItems(List<MergeItem> items) throws ErenderException {
        MessageContext mc = wsCtx.getMessageContext();
        HttpServletRequest req = (HttpServletRequest) mc.get(MessageContext.SERVLET_REQUEST);
        getLogger().info("Webservice method 'mergeContentOfItems' called from IP:" + req.getRemoteAddr());
        if (items.size() < 1) {
            String errDescription = "No barcodes specified!";
            throw new ErenderException(errDescription, new ErenderExceptionBean("201", errDescription),
                    new Throwable(errDescription));
        }
    
        com.itextpdf.text.Document document = new com.itextpdf.text.Document();
        ByteArrayOutputStream baOs = new ByteArrayOutputStream();
    
        PdfWriter writer = null;
        List<PdfReader> readers = new ArrayList<PdfReader>();
        int totalPages = 0;
    
        try {
            // Create a writer for the outputstream
            writer = PdfAWriter.getInstance(document, baOs, PdfAConformanceLevel.PDF_A_1B);
            writer.setPdfVersion(PdfWriter.PDF_VERSION_1_4);
            writer.createXmpMetadata();
    
            //writer = PdfWriter.getInstance(document, baOs);
    
            document.open();
    
            ICC_Profile icc = ICC_Profile
                    .getInstance(Thread.currentThread().getContextClassLoader().getResourceAsStream("srgb.profile"));
            writer.setOutputIntents("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", icc);
            PdfContentByte cb = writer.getDirectContent(); // Holds the PDF
    
            for (int i = 0; i < items.size(); i++) {
                String pdfFileName = null;
                File urlTempFile = null;
                if (items.get(i).getBarcode() != null) {
                    Template tmpl = TemplatesSynchronizer.getTemplateByBarcode(items.get(i).getBarcode());
                    String fileName = tmpl.getName();
                    pdfFileName = fileName.substring(0, fileName.indexOf(".")) + ".pdf";
                    getLogger().info("\tworking on:" + items.get(i) + " fileName:" + pdfFileName);
                    if (!new File(pdfFileName).exists()) {
                        String msg = String.format("Datoteka %s ne obstaja", pdfFileName);
                        throw new ErenderException("Error", new ErenderExceptionBean("109", msg, new Exception(msg)));
                    }
    
                } else if (items.get(i).getUrl() != null) {
                    urlTempFile = File.createTempFile("myTemp", "pdf");
                    FileUtils.copyURLToFile(new URL(items.get(i).getUrl()), urlTempFile);
                }
    
                if (pdfFileName != null || urlTempFile != null) {
                    PdfReader pdfReader = null;
                    if (pdfFileName != null)
                        pdfReader = new PdfReader(pdfFileName);
                    else if (urlTempFile != null)
                        pdfReader = new PdfReader(urlTempFile.getAbsolutePath());
    
                    if (pdfReader != null) {
                        // Create Readers for the pdfs.
                        readers.add(pdfReader);
                        totalPages += pdfReader.getNumberOfPages();
    
                        int pageOfCurrentReaderPDF = 0;
                        while (pageOfCurrentReaderPDF < pdfReader.getNumberOfPages()) {
                            document.newPage();
                            pageOfCurrentReaderPDF++;
                            PdfImportedPage page = writer.getImportedPage(pdfReader, pageOfCurrentReaderPDF);
                            document.setPageSize(pdfReader.getPageSizeWithRotation(pageOfCurrentReaderPDF));
                            document.newPage();
                            cb.addTemplate(page, 0, 0);
                        }
                    }
                    if (urlTempFile != null)
                        urlTempFile.delete();
                }
            }
    
        } catch (Throwable ex) {
            StringWriter errorStringWriter = new StringWriter();
            PrintWriter pw = new PrintWriter(errorStringWriter);
            ex.printStackTrace(pw);
            Logger.getLogger(this.getClass()).error(errorStringWriter.getBuffer().toString());
            throw new ErenderException("Error", new ErenderExceptionBean("109", "Napaka v merge metodi.",ex), ex);
    
        } finally {
    
            if (document != null && document.isOpen())
                try {
                    document.close();
                } catch (Exception ex) {
                    StringWriter errorStringWriter = new StringWriter();
                    PrintWriter pw = new PrintWriter(errorStringWriter);
                    ex.printStackTrace(pw);
                    Logger.getLogger(this.getClass()).error(errorStringWriter.getBuffer().toString());
    
    
                    getLogger().error("Unable to close document.\n" + errorStringWriter);
                }
    
            if (writer != null && writer.isCloseStream()) {
                try {
                    writer.flush();
                    writer.close();
                } catch (Exception ex) {
                    getLogger().error("Unable to flush or close writer");
                }
            }
    
            try {
                baOs.flush();
                baOs.close();
            } catch (Exception ex) {
                getLogger().error("Unable to close baOs in mergeContent method.");
            }
        }
        getLogger().info("Webservice method 'mergeContent' called from IP:" + req.getRemoteAddr() + " ended. " + totalPages
                + " merged.");
        return baOs.toByteArray();
    
    }
    

    Since I don't get error on other files this seems to be input files specific - here is one file to reproduce error: I am trying to convert this input pdf file: http://filebin.ca/2hR2xO1SNlzh/09062009073008005.pdf

  • S_S
    S_S over 5 years
    So what would be the solution if there are existing PDF files and there is a requirement that only PDF/A-2 files should be provided by the application?