PDFBox: How to "flatten" a PDF-form?

33,249

Solution 1

With PDFBox 2 it's now possible to "flatten" a PDF-form easily by calling the flatten method on a PDAcroForm object. See Javadoc: PDAcroForm.flatten().

Simplified code with an example call of this method:

//Load the document
PDDocument pDDocument = PDDocument.load(new File("E:\\Form-Test.pdf"));    
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();

//Fill the document
...

//Flatten the document
pDAcroForm.flatten();

//Save the document
pDDocument.save("E:\\Form-Test-Result.pdf");
pDDocument.close();

Note: dynamic XFA forms cannot be flatten.

For migration from PDFBox 1.* to 2.0, take a look at the official migration guide.

Solution 2

This works for sure - I've ran into this problem, debugged all-night, but finally figured out how to do this :)

This is assuming that you have capability to edit the PDF in some way/have some control over the PDF.

First, edit the forms using Acrobat Pro. Make them hidden and read-only.

Then you need to use two libraries: PDFBox and PDFClown.

PDFBox removes the thing that tells Adobe Reader that it's a form; PDFClown removes the actual field. PDFClown must be done first, then PDFBox (in that order. The other way around doesn't work).

Single field example code:

// PDF Clown code
File file = new File("Some file path"); 
Document document = file.getDocument();
Form form = file.getDocument.getForm();
Fields fields = form.getFields();
Field field = fields.get("some_field_name");

PageStamper stamper = new PageStamper(); 
FieldWidgets widgets = field.getWidgets();
Widget widget = widgets.get(0); // Generally is 0.. experiment to figure out
stamper.setPage(widget.getPage());

// Write text using text form field position as pivot.
PrimitiveComposer composer = stamper.getForeground();
Font font = font.get(document, "some_path"); 
composer.setFont(font, 10); 
double xCoordinate = widget.getBox().getX();
double yCoordinate = widget.getBox().getY(); 
composer.showText("text i want to display", new Point2D.Double(xCoordinate, yCoordinate)); 

// Actually delete the form field!
field.delete();
stamper.flush(); 

// Create new buffer to output to... 
Buffer buffer = new Buffer();
file.save(buffer, SerializationModeEnum.Standard); 
byte[] bytes = buffer.toByteArray(); 

// PDFBox code
InputStream pdfInput = new ByteArrayInputStream(bytes);
PDDocument pdfDocument = PDDocument.load(pdfInput);

// Tell Adobe we don't have forms anymore.
PDDocumentCatalog pdCatalog = pdfDocument.getDocumentCatalog();
PDAcroForm acroForm = pdCatalog.getAcroForm();
COSDictionary acroFormDict = acroForm.getDictionary();
COSArray cosFields = (COSArray) acroFormDict.getDictionaryObject("Fields");
cosFields.clear();

// Phew. Finally.
pdfDocument.save("Some file path");

Probably some typos here and there, but this should be enough to get the gist :)

Solution 3

setReadOnly did work for me as shown below -

   @SuppressWarnings("unchecked")
    List<PDField> fields = acroForm.getFields();
    for (PDField field : fields) {
        if (field.getFullyQualifiedName().equals("formfield1")) {
            field.setReadOnly(true);
        }
    }

Solution 4

After reading about pdf reference guide, I have discovered that you can quite easily set read-only mode for AcroForm fields by adding "Ff" key (Field flags) with value 1. This is what documentation stands about that:

If set, the user may not change the value of the field. Any associated widget annotations will not interact with the user; that is, they will not respond to mouse clicks or change their appearance in response to mouse motions. This flag is useful for fields whose values are computed or imported from a database.

so the code could look like that (using pdfbox lib):

 public static void makeAllWidgetsReadOnly(PDDocument pdDoc) throws IOException {

    PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();

    PDAcroForm form = catalog.getAcroForm();

    List<PDField> acroFormFields = form.getFields();

    System.out.println(String.format("found %d acroFrom fields", acroFormFields.size()));

    for(PDField field: acroFormFields) {
        makeAcroFieldReadOnly(field);
    }
}

private static void makeAcroFieldReadOnly(PDField field) {

    field.getDictionary().setInt("Ff",1);

}

Solution 5

Solution to flattening acroform AND retaining the form field values using pdfBox:

The solution that worked for me with pdfbox 2.0.1:

File myFile = new File("myFile.pdf");
PDDocument pdDoc = PDDocument.load(myFile);
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();

// set the NeedAppearances flag to false
pdAcroForm.setNeedAppearances(false);


field.setValue("new-value");

pdAcroForm.flatten();
pdDoc.save("myFlattenedFile.pdf");
pdDoc.close();

I didn't need to do the 2 extra steps in the above solution link:

// correct the missing page link for the annotations
// Add the missing resources to the form

I created my pdf form in OpenOffice 4.1.1 and exported to pdf. The 2 items selected in the OpenOffice export dialogue were:

  1. selected "create Pdf Form"
  2. Submit format of "PDF" - I found this gave smaller pdf file size than selecting "FDF" but still operated as a pdf form.

Using PdfBox I populated the form fields and created a flattened pdf file that removed the form fields but retained the form field values.

Share:
33,249
Lukas
Author by

Lukas

Updated on July 09, 2022

Comments

  • Lukas
    Lukas almost 2 years

    How do I "flatten" a PDF-form (remove the form-field but keep the text of the field) with PDFBox?

    Same question was answered here:

    a quick way to do this, is to remove the fields from the acrofrom.

    For this you just need to get the document catalog, then the acroform and then remove all fields from this acroform.

    The graphical representation is linked with the annotation and stay in the document.

    So I wrote this code:

    import java.io.File;
    import java.util.ArrayList;
    import java.util.List;
    
    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
    import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
    import org.apache.pdfbox.pdmodel.interactive.form.PDField;
    
    public class PdfBoxTest {
        public void test() throws Exception {
            PDDocument pdDoc = PDDocument.load(new File("E:\\Form-Test.pdf"));
            PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
            PDAcroForm acroForm = pdCatalog.getAcroForm();
    
            if (acroForm == null) {
                System.out.println("No form-field --> stop");
                return;
            }
    
            @SuppressWarnings("unchecked")
            List<PDField> fields = acroForm.getFields();
    
            // set the text in the form-field <-- does work
            for (PDField field : fields) {
                if (field.getFullyQualifiedName().equals("formfield1")) {
                    field.setValue("Test-String");
                }
            }
    
            // remove form-field but keep text ???
            // acroForm.getFields().clear();         <-- does not work
            // acroForm.setFields(null);             <-- does not work
            // acroForm.setFields(new ArrayList());  <-- does not work
            // ???
    
            pdDoc.save("E:\\Form-Test-Result.pdf");
            pdDoc.close();
        }
    }
    
  • Lukas
    Lukas over 11 years
    The code did not work for me. After I execute this code PDFBox does not recognise the form-field anymore but the AcrobatPdfReader still shows the form-fields. (Maybe some other parts have to removed from the COSDictionary, I don´t know.) I posted the answer however because it might help to find the correct answer.
  • Admin
    Admin over 10 years
    and in case you're wondering about licenses, PDFClown is LGPLv3, so if you're developing server-side stuff, it should most likely be alright (not a legal advice..). And PDFBox is Apache 2 or something, which equals free.
  • MaxArt
    MaxArt almost 10 years
    It doesn't work because the annotation widgets can live independently from a form field. When you remove the fields, you don't remove the widgets, which stay there even though they don't belong to any form field. To be effective, you have to remove the widgets' annotations from each page; to be efficient, you should remove the widget objects from the document too (that would mean removing parts that aren't referenced anymore).
  • Charles Caldwell
    Charles Caldwell almost 10 years
    Whereas you shouldn't post only links, I do think the code at that gist is too long to post here. Maybe just paste in the important part and provide the link as the complete code.
  • mkl
    mkl almost 10 years
    This will work fairly often but not always: in documents with the NeedsAppearances flag Set, some fields may not have appearances yet, or even appearances which do not represent the current value.
  • Jake W
    Jake W over 9 years
    Thanks so much for the sample code. It worked for me!
  • Qedrix
    Qedrix about 9 years
    I used this code and for some reason my checkboxes turn grey in color. Any clue why?
  • Stefano Chizzolini
    Stefano Chizzolini about 9 years
    @JakeW I'm glad to announce you that PDF Clown natively supports form flattening since version 0.2.0 (and 0.1.2.1). More info here: PDF Clown 0.2.0 — Enhanced content handling
  • Roger
    Roger over 8 years
    @Qedrix I updated the code to fix the grey background draw on checkboxes. The first byteArray in the tmpfield instanceof PDChoiceButton was responsible.
  • StanE
    StanE almost 7 years
    This is a very helpful information. Thank you for finding the actual name of the flag. I have written my own small PDF parser in PHP and was searching for that flag name. :-)
  • jamlhet
    jamlhet almost 6 years
    Nice -> pDAcroForm.flatten(); Using org.apache.pdfbox 2.0.4
  • ospf
    ospf almost 6 years
    Works like a charm.
  • Daniel Facciabene
    Daniel Facciabene almost 4 years
    This worked for me. It also resolved a problem I was having when digitally signing certain PDF forms (I did all the steps on the linked solutions, I'm guessing those extra steps corrected the malformed pdf internal structure.