PDFBox: How to "flatten" a PDF-form?

java pdfbox pdf-form

33,249

Solution 1

With PDFBox 2 it's now possible to "flatten" a PDF-form easily by calling the flatten method on a PDAcroForm object. See Javadoc: PDAcroForm.flatten().

Simplified code with an example call of this method:

//Load the document
PDDocument pDDocument = PDDocument.load(new File("E:\\Form-Test.pdf"));    
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();

//Fill the document
...

//Flatten the document
pDAcroForm.flatten();

//Save the document
pDDocument.save("E:\\Form-Test-Result.pdf");
pDDocument.close();

Note: dynamic XFA forms cannot be flatten.

For migration from PDFBox 1.* to 2.0, take a look at the official migration guide.

Solution 2

This works for sure - I've ran into this problem, debugged all-night, but finally figured out how to do this :)

This is assuming that you have capability to edit the PDF in some way/have some control over the PDF.

First, edit the forms using Acrobat Pro. Make them hidden and read-only.

Then you need to use two libraries: PDFBox and PDFClown.

PDFBox removes the thing that tells Adobe Reader that it's a form; PDFClown removes the actual field. PDFClown must be done first, then PDFBox (in that order. The other way around doesn't work).

Single field example code:

// PDF Clown code
File file = new File("Some file path"); 
Document document = file.getDocument();
Form form = file.getDocument.getForm();
Fields fields = form.getFields();
Field field = fields.get("some_field_name");

PageStamper stamper = new PageStamper(); 
FieldWidgets widgets = field.getWidgets();
Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out
stamper.setPage(widget.getPage());

// Write text using text form field position as pivot.
PrimitiveComposer composer = stamper.getForeground();
Font font = font.get(document, "some_path"); 
composer.setFont(font, 10); 
double xCoordinate = widget.getBox().getX();
double yCoordinate = widget.getBox().getY(); 
composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate)); 

// Actually delete the form field!
field.delete();
stamper.flush(); 

// Create new buffer to output to... 
Buffer buffer = new Buffer();
file.save(buffer, SerializationModeEnum.Standard); 
byte[] bytes = buffer.toByteArray(); 

// PDFBox code
InputStream pdfInput = new ByteArrayInputStream(bytes);
PDDocument pdfDocument = PDDocument.load(pdfInput);

// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();

// Phew. Finally.
pdfDocument.save("Some file path");

Probably some typos here and there, but this should be enough to get the gist :)

Solution 3

setReadOnly did work for me as shown below -

   @SuppressWarnings("unchecked")
    List<PDField> fields = acroForm.getFields();
    for (PDField field : fields) {
        if (field.getFullyQualifiedName().equals("formfield1")) {
            field.setReadOnly(true);
        }
    }

Solution 4

After reading about pdf reference guide, I have discovered that you can quite easily set read-only mode for AcroForm fields by adding "Ff" key (Field flags) with value 1. This is what documentation stands about that:

If set, the user may not change the value of the field. Any associated widget annotations will not interact with the user; that is, they will not respond to mouse clicks or change their appearance in response to mouse motions. This flag is useful for fields whose values are computed or imported from a database.

so the code could look like that (using pdfbox lib):

 public static void makeAllWidgetsReadOnly(PDDocument pdDoc) throws IOException {

    PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();

    PDAcroForm form = catalog.getAcroForm();

    List<PDField> acroFormFields = form.getFields();

    System.out.println(String.format("found %d acroFrom fields", acroFormFields.size()));

    for(PDField field: acroFormFields) {
        makeAcroFieldReadOnly(field);
    }
}

private static void makeAcroFieldReadOnly(PDField field) {

    field.getDictionary().setInt("Ff",1);

}

Solution 5

Solution to flattening acroform AND retaining the form field values using pdfBox:

see solution at https://mail-archives.apache.org/mod_mbox/pdfbox-users/201604.mbox/%[email protected]%3E

The solution that worked for me with pdfbox 2.0.1:

File myFile = new File("myFile.pdf");
PDDocument pdDoc = PDDocument.load(myFile);
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();

// set the NeedAppearances flag to false
pdAcroForm.setNeedAppearances(false);


field.setValue("new-value");

pdAcroForm.flatten();
pdDoc.save("myFlattenedFile.pdf");
pdDoc.close();

I didn't need to do the 2 extra steps in the above solution link:

// correct the missing page link for the annotations
// Add the missing resources to the form

I created my pdf form in OpenOffice 4.1.1 and exported to pdf. The 2 items selected in the OpenOffice export dialogue were:

selected "create Pdf Form"
Submit format of "PDF" - I found this gave smaller pdf file size than selecting "FDF" but still operated as a pdf form.

Using PdfBox I populated the form fields and created a flattened pdf file that removed the form fields but retained the form field values.

View more solutions

33,249

Author by

Lukas

Updated on July 09, 2022

Comments

Lukas almost 2 years

How do I "flatten" a PDF-form (remove the form-field but keep the text of the field) with PDFBox?

Same question was answered here:

a quick way to do this, is to remove the fields from the acrofrom.

For this you just need to get the document catalog, then the acroform and then remove all fields from this acroform.

The graphical representation is linked with the annotation and stay in the document.

So I wrote this code:

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;

public class PdfBoxTest {
    public void test() throws Exception {
        PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
        PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
        PDAcroForm acroForm = pdCatalog.getAcroForm();

        if (acroForm == null) {
            System.out.println("No form-field --> stop");
            return;
        }

        @SuppressWarnings("unchecked")
        List<PDField> fields = acroForm.getFields();

        // set the text in the form-field <-- does work
        for (PDField field : fields) {
            if (field.getFullyQualifiedName().equals("formfield1")) {
                field.setValue("Test-String");
            }
        }

        // remove form-field but keep text ???
        // acroForm.getFields().clear();         <-- does not work
        // acroForm.setFields(null);             <-- does not work
        // acroForm.setFields(new ArrayList());  <-- does not work
        // ???

        pdDoc.save("E:\\Form-Test-Result.pdf");
        pdDoc.close();
    }
}

Lukas over 11 years

The code did not work for me. After I execute this code PDFBox does not recognise the form-field anymore but the AcrobatPdfReader still shows the form-fields. (Maybe some other parts have to removed from the COSDictionary, I don´t know.) I posted the answer however because it might help to find the correct answer.
Admin over 10 years

and in case you're wondering about licenses, PDFClown is LGPLv3, so if you're developing server-side stuff, it should most likely be alright (not a legal advice..). And PDFBox is Apache 2 or something, which equals free.
MaxArt almost 10 years

It doesn't work because the annotation widgets can live independently from a form field. When you remove the fields, you don't remove the widgets, which stay there even though they don't belong to any form field. To be effective, you have to remove the widgets' annotations from each page; to be efficient, you should remove the widget objects from the document too (that would mean removing parts that aren't referenced anymore).
Charles Caldwell almost 10 years

Whereas you shouldn't post only links, I do think the code at that gist is too long to post here. Maybe just paste in the important part and provide the link as the complete code.
mkl almost 10 years

This will work fairly often but not always: in documents with the NeedsAppearances flag Set, some fields may not have appearances yet, or even appearances which do not represent the current value.
Jake W over 9 years

Thanks so much for the sample code. It worked for me!
Qedrix about 9 years

I used this code and for some reason my checkboxes turn grey in color. Any clue why?
Stefano Chizzolini about 9 years

@JakeW I'm glad to announce you that PDF Clown natively supports form flattening since version 0.2.0 (and 0.1.2.1). More info here: PDF Clown 0.2.0 — Enhanced content handling
Roger over 8 years

@Qedrix I updated the code to fix the grey background draw on checkboxes. The first byteArray in the tmpfield instanceof PDChoiceButton was responsible.
StanE almost 7 years

This is a very helpful information. Thank you for finding the actual name of the flag. I have written my own small PDF parser in PHP and was searching for that flag name. :-)
jamlhet almost 6 years

Nice -> pDAcroForm.flatten(); Using org.apache.pdfbox 2.0.4
ospf almost 6 years

Works like a charm.
Daniel Facciabene almost 4 years

This worked for me. It also resolved a problem I was having when digitally signing certain PDF forms (I did all the steps on the linked solutions, I'm guessing those extra steps corrected the malformed pdf internal structure.