Java library for reading Word documents

11,216

Solution 1

Apache POI HWPF for .doc and XWPF for .docx files

Solution 2

There is an apache project that does this: http://poi.apache.org//

Solution 3

public class XParseTest 
{
    public static void main(String[] args) throws XmlException, OpenXML4JException, IOException 
    {
        File file=new File("e:\\testing\\new.docx");
        FileInputStream fs = new FileInputStream(file);
        OPCPackage d = OPCPackage.open(fs);
        XWPFWordExtractor xw = new XWPFWordExtractor(d);
        System.out.println(xw.getText());    

    }

}

this will parse docx file...

Share:
11,216
Tony the Pony
Author by

Tony the Pony

Updated on June 04, 2022

Comments

  • Tony the Pony
    Tony the Pony almost 2 years

    Is there an open-source Java library for reading Word documents (both .docx and the older .doc format)?

    Read-only access if sufficient; I do not need to modify the Word documents using Java. However, I would like to have access to images and style information.

    EDIT

    I've checked out Apache POI, but it doesn't look like it is being actively maintained. See http://poi.apache.org/hwpf/index.html:

    At the moment we unfortunately do not have someone taking care for HWPF and fostering its development.

  • Tony the Pony
    Tony the Pony over 12 years
    Thanks, I've checked POI first, but it doesn't look like it being actively maintained...