How to Extract docx (Word 2007 above) using Apache POI

14,479

Solution 1

You need to Add dom4j Library to your claspath or your project libraries

Solution 2

It looks like you don't have all of the dependencies on your classpath.

If you look at http://poi.apache.org/overview.html you'll see that dom4j is a required library when working with the OOXML files. From the exception you got, it seems that you don't have it... If you look in the POI binary download, you should find it in the ooxml-libs subdirectory.

Share:
14,479
Admin
Author by

Admin

Updated on June 04, 2022

Comments

  • Admin
    Admin almost 2 years

    Hai, i'm using Apache POI 3.6 I've already created some code..

    XWPFDocument doc = new XWPFDocument(new FileInputStream(file));
             wordxExtractor = new XWPFWordExtractor(doc);
             text = wordxExtractor.getText();
    
             System.out.println("adding docx " + file);
             d.add(new Field("content", text, Field.Store.NO, Field.Index.ANALYZED));
    

    unfortunately, it generated error..

    Exception in thread "main" java.lang.NoClassDefFoundError: org/dom4j/DocumentException
    at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:149)
    at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:136)
    at org.apache.poi.openxml4j.opc.Package.<init>(Package.java:54)
    at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:98)
    at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:199)
    at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
    at org.apache.poi.util.PackageHelper.open(PackageHelper.java:53)
    at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:98)
    at org.apache.lucene.demo.Indexer.indexDocs(Indexer.java:153)
    at org.apache.lucene.demo.Indexer.main(Indexer.java:88)
    

    It seemed that it used Constructor

    XWPFWordExtractor(OPCPackage container)

    but not this one ->

    XWPFWordExtractor(XWPFDocument document)

    Any wondering why?? Or any idea how I can extract the .docx then convert it into a String?