Reading .docx file in java
47,622
Solution 1
import java.io.File;
import java.io.FileInputStream;
import java.util.List;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public void readDocxFile() {
try {
File file = new File("C:/NetBeans Output/documentx.docx");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph para : paragraphs) {
System.out.println(para.getText());
}
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
Solution 2
You cannot read the docx file or doc file directly. You need to have an API to read word files. Use Apache POI http://poi.apache.org/. If you get any doubts, please refer this thread on stackoverflow.com How read Doc or Docx file in java?
Solution 3
you must have following 6 jar:
- xmlbeans-2.3.0.jar
- dom4j-1.6.1.jar
- poi-ooxml-3.8-20120326.jar
- poi-ooxml-schemas-3.8-20120326.jar
- poi-scratchpad-3.2-FINAL.jar
- poi-3.5-FINAL.jar
Code:
import java.io.File;
import java.io.FileInputStream;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public class test {
public static void readDocxFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for(int i=0;i<paragraphs.size();i++){
System.out.println(paragraphs.get(i).getParagraphText());
}
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
readDocxFile("C:\\Users\\sp0c43734\\Desktop\\SwatiPisal.docx");
}
}
Comments
-
Addict over 2 years
I am trying to read one file in java, following is the code :
public void readFile(String fileName){ try { BufferedReader reader= new BufferedReader(new FileReader(fileName)); String line=null; while((line=reader.readLine()) != null ){ System.out.println(line); } }catch (Exception ex){} }
It is working fine in case of txt file. However in case of docx file, it is printing weird characters. How can i read .docx file in Java.
-
SpringLearner over 8 yearsYour code will not compile. paragraphs is not defined