How to Extract docx (Word 2007 above) using Apache POI
Solution 1
You need to Add dom4j Library to your claspath or your project libraries
Solution 2
It looks like you don't have all of the dependencies on your classpath.
If you look at http://poi.apache.org/overview.html you'll see that dom4j is a required library when working with the OOXML files. From the exception you got, it seems that you don't have it... If you look in the POI binary download, you should find it in the ooxml-libs subdirectory.
Admin
Updated on June 04, 2022Comments
-
Admin almost 2 years
Hai, i'm using Apache POI 3.6 I've already created some code..
XWPFDocument doc = new XWPFDocument(new FileInputStream(file)); wordxExtractor = new XWPFWordExtractor(doc); text = wordxExtractor.getText(); System.out.println("adding docx " + file); d.add(new Field("content", text, Field.Store.NO, Field.Index.ANALYZED));
unfortunately, it generated error..
Exception in thread "main" java.lang.NoClassDefFoundError: org/dom4j/DocumentException at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:149) at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:136) at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:98) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178) at org.apache.poi.util.PackageHelper.open(PackageHelper.java:53) at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:98) at org.apache.lucene.demo.Indexer.indexDocs(Indexer.java:153) at org.apache.lucene.demo.Indexer.main(Indexer.java:88)
It seemed that it used Constructor
XWPFWordExtractor(OPCPackage container)
but not this one ->
XWPFWordExtractor(XWPFDocument document)
Any wondering why?? Or any idea how I can extract the .docx then convert it into a String?