Convert docx file into PDF with Java

16,393

There are lot of methods to do conversion One of the used method is using POI and DOCX4j

InputStream is = new FileInputStream(new File("your Docx PAth"));
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
                    .load(is);
            List sections = wordMLPackage.getDocumentModel().getSections();
            for (int i = 0; i < sections.size(); i++) {
                wordMLPackage.getDocumentModel().getSections().get(i)
                        .getPageDimensions();
            }
            Mapper fontMapper = new IdentityPlusMapper();
            PhysicalFont font = PhysicalFonts.getPhysicalFonts().get(
                    "Comic Sans MS");//set your desired font 
            fontMapper.getFontMappings().put("Algerian", font);
            wordMLPackage.setFontMapper(fontMapper);
            PdfSettings pdfSettings = new PdfSettings();
            org.docx4j.convert.out.pdf.PdfConversion conversion = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
                    wordMLPackage);
            //To turn off logger
            List<Logger> loggers = Collections.<Logger> list(LogManager
                    .getCurrentLoggers());
            loggers.add(LogManager.getRootLogger());
            for (Logger logger : loggers) {
                logger.setLevel(Level.OFF);
            }
            OutputStream out = new FileOutputStream(new File("Your OutPut PDF path"));
            conversion.output(out, pdfSettings);
            System.out.println("DONE!!"); 

This works perfect and even tried on multiple DOCX files.

Share:
16,393
Ferguson
Author by

Ferguson

Updated on June 13, 2022

Comments

  • Ferguson
    Ferguson almost 2 years

    I'am looking for some "stable" method to convert DOCX file from MS WORD into PDF. Since now I have used OpenOffice installed as listener but it often hangs. The problem is that we have situations when many users want to convert SXW,DOCX files into PDF at the same time. Is there some other possibility? I tryed with examples from this site: https://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/ but the output result is not good (converted documents have errors and layout is quite modified).

    here is "source" docx document: enter image description here

    here is document converted with docx4j with some exception text inside document. Also the text in upper right corner is missing.

    enter image description here

    this one is PDF created with OpenOffice as converter from docx to pdf. Some text is missing "upper right corner"

    enter image description here

    Is there some other option to convert docx into pdf with Java?

    • Stefan Hegny
      Stefan Hegny over 7 years
      Not on SO; when you would be asking "to recommend a tool or library" - but why not just try to get you openoffice setup stable?
    • Davide
      Davide over 7 years
      You can use JODConverter (code.google.com/archive/p/jodconverter) or docx4j (docx4java.org/trac/docx4j)
    • Ferguson
      Ferguson over 7 years
      JODConverter uses OpenOffice in background.. The problem is that OpenOffice sometimes hangs (crash) without any reason. I also tryed docx4j (look at my question)
    • JasonPlutext
      JasonPlutext over 7 years
      That's a 4 year old article you reference there. These days, the recommended way to do it from docx4j is with Plutext's commercial PDF Converter. You can try that online at converter-eval.plutext.com
  • Ferguson
    Ferguson over 7 years
    Tryed with your method but stil get some exception: WARN org.apache.fop.image.loader.batik.PreloaderSVG .preloadImage line 76 - Batik not in class path java.lang.NoClassDefFoundError: org/apache/batik/bridge/UserAgent at org.apache.fop.image.loader.batik.PreloaderSVG.preloadImage(‌​PreloaderSVG.java:69‌​)
  • KishanCS
    KishanCS over 7 years
    import org.apache.log4j.Level; import org.apache.log4j.LogManager; import org.apache.log4j.Logger; import org.docx4j.convert.out.pdf.viaXSLFO.PdfSettings; import org.docx4j.fonts.IdentityPlusMapper; import org.docx4j.fonts.Mapper; import org.docx4j.fonts.PhysicalFont; import org.docx4j.fonts.PhysicalFonts; import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
  • Ferguson
    Ferguson over 7 years
    still get the same malformed PDF as in docx4j... here is: s5.postimg.org/ptxrxtfyf/screenshot_1540.jpg
  • KishanCS
    KishanCS over 7 years
    //To turn off logger List<Logger> loggers = Collections.<Logger> list(LogManager .getCurrentLoggers()); loggers.add(LogManager.getRootLogger()); for (Logger logger : loggers) { logger.setLevel(Level.OFF); } This turns off those messages
  • Ferguson
    Ferguson over 7 years
    Will try to remove log but text (upper right corner), footer etc is missing in PDF document...
  • KishanCS
    KishanCS over 7 years
    Is it an originally created docx or converted . Please check
  • KishanCS
    KishanCS over 7 years
    If possible provide the docx file .
  • Ferguson
    Ferguson over 7 years
    It's a document created in MS WORD - Office professional 2013.. s5.postimg.org/63a55ovlz/screenshot_1541.jpg If you can try here is my document: drive.google.com/file/d/0B6Z9wNTXyUEeOUtFRVhZeWtnZ3M/…
  • KishanCS
    KishanCS over 7 years
    Check all dependencies once and rebuild the project . IT works charm!! Thank you
  • Ferguson
    Ferguson over 7 years
    Can you please send me a link with all included libraries? I have download librarires from this site: angelozerr.wordpress.com/2012/12/06/…
  • Ferguson
    Ferguson over 7 years
    Also if I download latest library from docx4java I can't find Class org.docx4j.convert.out.pdf.PdfConversion
  • JasonPlutext
    JasonPlutext over 7 years
    The code sample in this answer uses docx4j, not POI :-)
  • JasonPlutext
    JasonPlutext over 7 years
    In the most recent docx4j, the export via XSL FO is a separate library, so you'd need that jar and its dependencies. Or use our commercial PDF Converter I recommended in my other comment :-)
  • Ferguson
    Ferguson over 7 years
    HI JasonPlutext.. Have tryed your online converter but in generated PDF there is no image in the lower left corner... s5.postimg.org/k5w2ko0zr/screenshot_1542.jpg ant this is original document: s5.postimg.org/8utewau4n/screenshot_1543.jpg any idea?
  • JasonPlutext
    JasonPlutext over 7 years
    Would need to see the source docx. Can you email it to me, or drag it to ndoc.it and paste the resulting link here?