How to read docx file content in java api using poi jar

25,912

Solution 1

This is covered in the Apache POI FAQ! The entry you want is I'm using the poi-ooxml-schemas jar, but my code is failing with "java.lang.NoClassDefFoundError: org/openxmlformats/schemas/something"

The short answer is to switch the poi-ooxml-schemas jar for the full ooxml-schemas-1.1 jar. The full answer is given in the FAQ

Solution 2

For reading excels or docx file if you want to solve errors you need to add all jars then you wont get any error.

Share:
25,912
nagesh
Author by

nagesh

I'm a java programmer, strives to learn and apply best practices in coding.

Updated on July 22, 2022

Comments

  • nagesh
    nagesh almost 2 years

    I have done reading doc file now i'm trying to read docx file content. when i searched for sample code i found many, nothing worked. check the code for reference...

    import java.io.*;
    import org.apache.poi.xwpf.usermodel.XWPFDocument;
    import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
    import com.itextpdf.text.pdf.PdfWriter;
    import com.itextpdf.text.Document;
    import com.itextpdf.text.Paragraph;
    
    public class createPdfForDocx {
    
    public static void main(String[] args) {
    InputStream fs = null;  
        Document document = new Document();
        XWPFWordExtractor extractor = null ;
    
    try {
    
        fs = new FileInputStream("C:\\DATASTORE\\test.docx");
        //XWPFDocument hdoc=new XWPFDocument(fs);
        XWPFDocument hdoc=new XWPFDocument(OPCPackage.open(fs));
        //XWPFDocument hdoc=new XWPFDocument(fs);
        extractor = new XWPFWordExtractor(hdoc);
        OutputStream fileOutput = new FileOutputStream(new       File("C:/DATASTORE/test.pdf"));
        PdfWriter.getInstance(document, fileOutput);
        document.open();
        String fileData=extractor.getText();
        System.out.println(fileData);
        document.add(new Paragraph(fileData));
        System.out.println(" pdf document created");
            } catch(IOException e) {
                System.out.println("IO Exception");
                 e.printStackTrace();
              } catch(Exception ex) {
                 ex.printStackTrace();
               }finally {  
                    document.close();  
               } 
     }//end of main()
    }//end of class
    

    For the above code i'm getting following Exception:

    org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
    at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:60)
    at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:277)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186)
    at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:107)
    at pagecode.createPdfForDocx.main(createPdfForDocx.java:20)
    Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:67)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:521)
    at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:58)
    ... 4 more
    Caused by: java.lang.NoSuchMethodError: org/openxmlformats/schemas/wordprocessingml/x2006/main/CTStyles.getStyleList()Ljava/util/List;
    at org.apache.poi.xwpf.usermodel.XWPFStyles.onDocumentRead(XWPFStyles.java:78)
    at org.apache.poi.xwpf.usermodel.XWPFStyles.<init>(XWPFStyles.java:59)
    ... 9 more
    

    Please help Thank you

  • nagesh
    nagesh almost 11 years
    Thank you so much. After changing the jar as you recommended i'm getting the output.It's working!!!!. If you have any clue on how to parse docx content please help me out. I have to finding the exact word in the file and need to modify it.
  • Gagravarr
    Gagravarr almost 11 years
    Look at the examples that ship with Apache POI, and the text extractors in Apache POI, they should give you lots of similar code to look at. If that doesn't help, you'll need to ask a new question
  • Gagravarr
    Gagravarr almost 11 years
    Also, if this answer has solved the problem for you, please "accept" it to mark it as correct by clicking the tick next to the answer