Java library for reading Word documents
11,216
Solution 1
Apache POI HWPF for .doc and XWPF for .docx files
Solution 2
There is an apache project that does this: http://poi.apache.org//
Solution 3
public class XParseTest
{
public static void main(String[] args) throws XmlException, OpenXML4JException, IOException
{
File file=new File("e:\\testing\\new.docx");
FileInputStream fs = new FileInputStream(file);
OPCPackage d = OPCPackage.open(fs);
XWPFWordExtractor xw = new XWPFWordExtractor(d);
System.out.println(xw.getText());
}
}
this will parse docx file...
Author by
Tony the Pony
Updated on June 04, 2022Comments
-
Tony the Pony almost 2 years
Is there an open-source Java library for reading Word documents (both .docx and the older .doc format)?
Read-only access if sufficient; I do not need to modify the Word documents using Java. However, I would like to have access to images and style information.
EDIT
I've checked out Apache POI, but it doesn't look like it is being actively maintained. See http://poi.apache.org/hwpf/index.html:
At the moment we unfortunately do not have someone taking care for HWPF and fostering its development.
-
Tony the Pony over 12 yearsThanks, I've checked POI first, but it doesn't look like it being actively maintained...