How to get the content of PDF form text fields using pdfbox?

10,621

Solution 1

This is how you get key/value for AcroForms: (This particular program prints it to the console.)

package pdf_form_filler;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.*;
import java.io.File;
import java.util.*;

public class pdf_form_filler {

    public static void listFields(PDDocument doc) throws Exception {
        PDDocumentCatalog catalog = doc.getDocumentCatalog();
        PDAcroForm form = catalog.getAcroForm();
        List<PDFieldTreeNode> fields = form.getFields();

        for(PDFieldTreeNode field: fields) {
            Object value = field.getValue();
            String name = field.getFullyQualifiedName();
            System.out.print(name);
            System.out.print(" = ");
            System.out.print(value);
            System.out.println();
        }
    }

    public static void main(String[] args) throws Exception {
        File file = new File("test.pdf");
        PDDocument doc = PDDocument.load(file);
        listFields(doc);
    }

}

Solution 2

PDFieldTreeNode doesn't seem to be supported anymore. Try PDField

Share:
10,621
VolleyJosh
Author by

VolleyJosh

Updated on June 16, 2022

Comments

  • VolleyJosh
    VolleyJosh almost 2 years

    I'm using this to get the text of a PDF file using org.apache.pdfbox

    File f = new File(fileName);  
          if (!f.isFile()) {
                 System.out.println("File " + fileName + " does not exist.");
             return null;
        }
    
            try {
                parser = new PDFParser(new FileInputStream(f));
            } catch (Exception e) {
                 System.out.println("Unable to open PDF Parser.");
                return null;
            }
       try {
               parser.parse();
                 cosDoc = parser.getDocument();
               pdfStripper = new PDFTextStripper();           
              pdDoc = new PDDocument(cosDoc);
                parsedText = pdfStripper.getText(pdDoc);
            } catch (Exception e) {
                e.printStackTrace();
            }
    

    It works great for the PDFs I've used it on so far. Now I have a PDF form that has editable text fields in it. My code does not return the text inside the fields. I would like to get that text. Is there a way to get it using PDFBox?