Apache POI or docx4j for dealing with docx documents

31,178

Solution 1

Disclosure: I lead the docx4j project

Although docx4j can also handle pptx and xlsx, it is mostly used for docx manipulation. By way of illustration, as at the time of writing, there are nearly 1000 topics in the docx4j forum. The pptx forum has only 10% of the volume.

Whatever you want to do with the docx document, docx4j ought to be able to help you. There's a single page overview of a generic workflow.

For many common requirements, docx4j provides higher level API. These include:

  • Create/open/save docx (of course)

  • Report/document generation, using a variety of approaches: (i) Variable substitution, (ii) XML data binding (particularly strong), and (iii) Mailmerge

  • Export as HTML, XHTML

  • Export as PDF (with font support)

For anything else, you can manipulate the JAXB representation of the docx to your heart's content. JAXB is a Java community standard, included in Java 6, and with a strong alternative implementation in EclipseLink's MOXy. (POI uses XML Beans instead of JAXB)

There's a web app to help you explore a docx, and generate Java code to create corresponding Java objects.

Of course, if there is some specific task you have in mind, it may be that docx4j or POI has a particular strength there.

Both docx4j and POI are ASL v2 licensed.

docx4j is actively maintained; its source code is on GitHub.

In addition, commercial support is available for docx4j if you want it, as are several commercial extensions eg MergeDocx.

docx4j does rely on POI as a library for its implementation of the OLE 2 Compound Document format, which we're grateful for.

Solution 2

I think Apache POI 's main focus is on dealing with spreadsheets though i has features to read word documents and it uses xml beans to do so. Docx4j mainly deals with docx documents using jaxb. Usually jaxb allows xml to java object conversion hence i think docx4j would be preferable for your case.

Solution 3

If you are dealing with docx document, docx4j is more convenient than Apache POI. You can use following links to learn basics of docx4j. Also, there is a nice forum of docx4j.

1.http://blog.iprofs.nl/2012/09/06/creating-word-documents-with-docx4j/ 2.http://www.smartjava.org/content/create-complex-word-docx-documents-programatically-docx4j?

Solution 4

I tried Apache POI, but the problem is when printing anything from docx file (Ex: To print all "Heading1" elements from docx),it gets printed lots of bad data and whitespaces. Docx4j will avoid this bad data, I tried it.

Share:
31,178

Related videos on Youtube

becks
Author by

becks

Updated on May 09, 2020

Comments

  • becks
    becks almost 4 years

    What do you think Which is better to use to read docx document as java objects and why ?

    in other words. which library supports most of the word tags ?

  • becks
    becks about 11 years
    All processing comes from the XML parsing right ? can I for example interactively use word document through docx4j . like if I want to search certain text and select the result exactly as search box does ?
  • JasonPlutext
    JasonPlutext about 11 years
    The XML is unmarshalled into JAXB objects; processing is then generally done at that level. docx4j is a library. To use it interactively, you'd have to make an interactive application. docx4all is an example of an interactive application (a wordprocessor) based on docx4j. With docx4j, you can search for text, and do stuff with the results.
  • Stephane Grenier
    Stephane Grenier over 10 years
    Does docx4j have support for tables within docx files? I just tried for example to create a purchase order docx file and convert it to pdf and the table was really badly formatted. I used the sample webapp on the docx4j website at: webapp.docx4java.org/OnlineDemo/docx_to_pdf_fop.html
  • JasonPlutext
    JasonPlutext over 10 years
    See my answer to your question at stackoverflow.com/questions/20437235/…
  • WiredCoder
    WiredCoder almost 8 years
    Can it split word documents?, which is not possible in case of the POI API.
  • JasonPlutext
    JasonPlutext almost 8 years
    You can split/merge with the Enterprise edition, or do a crude split with docx4j (resulting in a bigger file, since unused images etc would still be in the zip)
  • Ced
    Ced over 7 years
    This is fantastic. I'm mainly interested in using bookmarks accross multiple files. Unfortunately those are doc, but I can convert them with word beforehand. The formating is however very important for me, so can you confirm it is gonna keep the formatting ? I need to sleep now
  • Daniel
    Daniel over 6 years
    Could you add the link to the API documentation in your answer? I saw there are 3 examples in GitHub, but I can't find documentation besides the "Getting Started"