How to read docx file content in java api using poi jar
25,912
Solution 1
This is covered in the Apache POI FAQ! The entry you want is I'm using the poi-ooxml-schemas jar, but my code is failing with "java.lang.NoClassDefFoundError: org/openxmlformats/schemas/something"
The short answer is to switch the poi-ooxml-schemas
jar for the full ooxml-schemas-1.1
jar. The full answer is given in the FAQ
Solution 2
For reading excels or docx file if you want to solve errors you need to add all jars then you wont get any error.
Author by
nagesh
I'm a java programmer, strives to learn and apply best practices in coding.
Updated on July 22, 2022Comments
-
nagesh almost 2 years
I have done reading doc file now i'm trying to read docx file content. when i searched for sample code i found many, nothing worked. check the code for reference...
import java.io.*; import org.apache.poi.xwpf.usermodel.XWPFDocument; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; import com.itextpdf.text.pdf.PdfWriter; import com.itextpdf.text.Document; import com.itextpdf.text.Paragraph; public class createPdfForDocx { public static void main(String[] args) { InputStream fs = null; Document document = new Document(); XWPFWordExtractor extractor = null ; try { fs = new FileInputStream("C:\\DATASTORE\\test.docx"); //XWPFDocument hdoc=new XWPFDocument(fs); XWPFDocument hdoc=new XWPFDocument(OPCPackage.open(fs)); //XWPFDocument hdoc=new XWPFDocument(fs); extractor = new XWPFWordExtractor(hdoc); OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/test.pdf")); PdfWriter.getInstance(document, fileOutput); document.open(); String fileData=extractor.getText(); System.out.println(fileData); document.add(new Paragraph(fileData)); System.out.println(" pdf document created"); } catch(IOException e) { System.out.println("IO Exception"); e.printStackTrace(); } catch(Exception ex) { ex.printStackTrace(); }finally { document.close(); } }//end of main() }//end of class
For the above code i'm getting following Exception:
org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:60) at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:277) at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186) at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:107) at pagecode.createPdfForDocx.main(createPdfForDocx.java:20) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:67) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:521) at org.apache.poi.xwpf.usermodel.XWPFFactory.createDocumentPart(XWPFFactory.java:58) ... 4 more Caused by: java.lang.NoSuchMethodError: org/openxmlformats/schemas/wordprocessingml/x2006/main/CTStyles.getStyleList()Ljava/util/List; at org.apache.poi.xwpf.usermodel.XWPFStyles.onDocumentRead(XWPFStyles.java:78) at org.apache.poi.xwpf.usermodel.XWPFStyles.<init>(XWPFStyles.java:59) ... 9 more
Please help Thank you
-
nagesh almost 11 yearsThank you so much. After changing the jar as you recommended i'm getting the output.It's working!!!!. If you have any clue on how to parse docx content please help me out. I have to finding the exact word in the file and need to modify it.
-
Gagravarr almost 11 yearsLook at the examples that ship with Apache POI, and the text extractors in Apache POI, they should give you lots of similar code to look at. If that doesn't help, you'll need to ask a new question
-
Gagravarr almost 11 yearsAlso, if this answer has solved the problem for you, please "accept" it to mark it as correct by clicking the tick next to the answer