Converting docx into pdf in java

77,877

Solution 1

You are missing some libraries.

I am able to run your code by adding the following libraries:

    Apache POI 3.15
    org.apache.poi.xwpf.converter.core-1.0.6.jar
    org.apache.poi.xwpf.converter.pdf-1.0.6.jar
    fr.opensagres.xdocreport.itext.extension-2.0.0.jar
    itext-2.1.7.jar
    ooxml-schemas-1.3.jar

I have successfully converted a 6 pages long Word document (.docx) with tables, images and various formatting.

Solution 2

In addition to the VivekRatanSinha answer, i would i like to post full code and required jars for the people who need it in future.

Code:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class WordConvertPDF {
    public static void main(String[] args) {
        WordConvertPDF cwoWord = new WordConvertPDF();
        cwoWord.ConvertToPDF("D:/Test.docx", "D:/Test.pdf");
    }

    public void ConvertToPDF(String docPath, String pdfPath) {
        try {
            InputStream doc = new FileInputStream(new File(docPath));
            XWPFDocument document = new XWPFDocument(doc);
            PdfOptions options = PdfOptions.create();
            OutputStream out = new FileOutputStream(new File(pdfPath));
            PdfConverter.getInstance().convert(document, out, options);
        } catch (IOException ex) {
            System.out.println(ex.getMessage());
        }
    }
}

and JARS:

required jars

Enjoy :)

Solution 3

I will provide 3 methods to convert docx to pdf :

  1. Using itext and opensagres and apache poi

Code :

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions;
import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class ConvertDocToPdfitext {

  public static void main(String[] args) {
    System.out.println( "Starting conversion!!!" );
    ConvertDocToPdfitext cwoWord = new ConvertDocToPdfitext();
    cwoWord.ConvertToPDF("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx", "C:/Users/avijit.shaw/Desktop/testing/docx/Test-1.pdf");
    System.out.println( "Ending conversion!!!" );
  }

  public void ConvertToPDF(String docPath, String pdfPath) {
    try {
        InputStream doc = new FileInputStream(new File(docPath));
        XWPFDocument document = new XWPFDocument(doc);
        PdfOptions options = PdfOptions.create();
        OutputStream out = new FileOutputStream(new File(pdfPath));
        PdfConverter.getInstance().convert(document, out, options);
    } catch (IOException ex) {
        System.out.println(ex.getMessage());
    }
  }
}

Dependencies: Use Maven to resolve dependencies.

New version 2.0.2 of fr.opensagres.poi.xwpf.converter.core runs with apache poi 4.0.1 and itext 2.17. You just need to add below dependency in Maven and then maven will auto download all dependent dependencies. (Updated your Maven project, so it downloaded all these libraries and all of its dependencies)

<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
    <version>2.0.2</version>
</dependency>
  1. Using Documents4j

Note: You need to have MS Office installed on the machine in which this code is running.

Code :

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;

import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;

public class Document4jApp {

  public static void main(String[] args) {

      File inputWord = new File("C:/Users/avijit.shaw/Desktop/testing/docx/Account Opening Prototype Details.docx");
      File outputFile = new File("Test_out.pdf");
      try  {
          InputStream docxInputStream = new FileInputStream(inputWord);
          OutputStream outputStream = new FileOutputStream(outputFile);
          IConverter converter = LocalConverter.builder().build();         
          converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
          outputStream.close();
          System.out.println("success");
      } catch (Exception e) {
          e.printStackTrace();
      }
  }
}

Dependencies : Use Maven to resolve dependencies.

<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-local</artifactId>
    <version>1.0.3</version>
</dependency>
<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-transformer-msoffice-word</artifactId>
    <version>1.0.3</version>
</dependency>
  1. Using openoffice nuoil

Note: You need to have OpenOffice installed on the machine in which this code is running. Code :


import java.io.File;
import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.BootstrapException;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XDesktop;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.uno.Exception;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;

import ooo.connector.BootstrapSocketConnector;

public class App {
  public static void main(String[] args) throws Exception, BootstrapException {
      System.out.println("Stating conversion!!!");
      // Initialise
      String oooExeFolder = "C:\\Program Files (x86)\\OpenOffice 4\\program"; //Provide path on which OpenOffice is installed
      XComponentContext xContext = BootstrapSocketConnector.bootstrap(oooExeFolder);
      XMultiComponentFactory xMCF = xContext.getServiceManager();
  
      Object oDesktop = xMCF.createInstanceWithContext("com.sun.star.frame.Desktop", xContext);
  
      XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(XDesktop.class, oDesktop);
  
      // Load the Document
      String workingDir = "C:/Users/avijit.shaw/Desktop/testing/docx/"; //Provide directory path of docx file to be converted
      String myTemplate = workingDir + "Account Opening Prototype Details.docx"; // Name of docx file to be converted
  
      if (!new File(myTemplate).canRead()) {
        throw new RuntimeException("Cannot load template:" + new File(myTemplate));
      }
  
      XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime
          .queryInterface(com.sun.star.frame.XComponentLoader.class, xDesktop);
  
      String sUrl = "file:///" + myTemplate;
  
      PropertyValue[] propertyValues = new PropertyValue[0];
  
      propertyValues = new PropertyValue[1];
      propertyValues[0] = new PropertyValue();
      propertyValues[0].Name = "Hidden";
  

          propertyValues[0].Value = new Boolean(true);
  
      XComponent xComp = xCompLoader.loadComponentFromURL(sUrl, "_blank", 0, propertyValues);
  
      // save as a PDF
      XStorable xStorable = (XStorable) UnoRuntime.queryInterface(XStorable.class, xComp);
  
      propertyValues = new PropertyValue[2];
      // Setting the flag for overwriting
      propertyValues[0] = new PropertyValue();
      propertyValues[0].Name = "Overwrite";
      propertyValues[0].Value = new Boolean(true);
      // Setting the filter name
      propertyValues[1] = new PropertyValue();
      propertyValues[1].Name = "FilterName";
      propertyValues[1].Value = "writer_pdf_Export";
  
      // Appending the favoured extension to the origin document name
      String myResult = workingDir + "letterOutput.pdf"; // Name of pdf file to be output
      xStorable.storeToURL("file:///" + myResult, propertyValues);
  
      System.out.println("Saved " + myResult);
  
      // shutdown
      xDesktop.terminate();
  }
}

Dependencies : Use Maven to resolve dependencies.

<!-- https://mvnrepository.com/artifact/org.openoffice/unoil -->
    <dependency>
        <groupId>org.openoffice</groupId>
        <artifactId>unoil</artifactId>
        <version>3.2.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.openoffice/juh -->
    <dependency>
        <groupId>org.openoffice</groupId>
        <artifactId>juh</artifactId>
        <version>3.2.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.openoffice/bootstrap-connector -->
    <dependency>
        <groupId>org.openoffice</groupId>
        <artifactId>bootstrap-connector</artifactId>
        <version>0.1.1</version>
    </dependency>

Solution 4

If your document is pretty rich and your option is to do the converting on Linux/Unix then all three main options suggested in the thread could be "a bit" painful to implement.

A solution which I might suggest is to use Gotenberg: A Docker-powered stateless API for converting HTML, Markdown and Office documents to PDF.

  • launch the container $ docker run --rm -p 3000:3000 thecodingmachine/gotenberg:6
  • make request to the container. Here is how using curl:
$ curl --request POST \
    --url http://localhost:3000/convert/office \
    --header 'Content-Type: multipart/form-data' \
    --form [email protected] \
    --form [email protected] \
    -o result.pdf

Deploy it to your infrastructure (e.g. as separate microservice) and shoot it from your Java service making that simple HTTP request. Get your PDF file in the response and do what you want with it.

Tested, works like a charm!

Solution 5

I have done a lot of research and found Documents4j is the best free API for converting docx to pdf. Alignment, font everything documents4j doing good job.

Maven Dependencies:

<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-local</artifactId>
    <version>1.0.3</version>
</dependency>
<dependency>
    <groupId>com.documents4j</groupId>
    <artifactId>documents4j-transformer-msoffice-word</artifactId>
    <version>1.0.3</version>
</dependency>

Use the below code for convert docx to pdf.

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;

import com.documents4j.api.DocumentType;
import com.documents4j.api.IConverter;
import com.documents4j.job.LocalConverter;

public class Document4jApp {

    public static void main(String[] args) {

        File inputWord = new File("Tests.docx");
        File outputFile = new File("Test_out.pdf");
        try  {
            InputStream docxInputStream = new FileInputStream(inputWord);
            OutputStream outputStream = new FileOutputStream(outputFile);
            IConverter converter = LocalConverter.builder().build();
            converter.convert(docxInputStream).as(DocumentType.DOCX).to(outputStream).as(DocumentType.PDF).execute();
            outputStream.close();
            System.out.println("success");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
Share:
77,877

Related videos on Youtube

Yannick Huber
Author by

Yannick Huber

Knock knock Race condition Who's there?

Updated on December 10, 2021

Comments

  • Yannick Huber
    Yannick Huber over 2 years

    I am trying to convert a docx file which contains table and images into a pdf format file.

    I have been searching everywhere but did not get proper solution, request to give proper and correct solution:

    here what i have tried :

    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileNotFoundException;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.OutputStream;
    import org.apache.poi.xwpf.converter.pdf.PdfConverter;
    import org.apache.poi.xwpf.converter.pdf.PdfOptions;
    import org.apache.poi.xwpf.usermodel.XWPFDocument;
    
    public class TestCon {
    
        public static void main(String[] args) {
            TestCon cwoWord = new TestCon();
            System.out.println("Start");
            cwoWord.ConvertToPDF("D:\\Test.docx", "D:\\Test1.pdf");
        }
    
        public void ConvertToPDF(String docPath, String pdfPath) {
            try {
                InputStream doc = new FileInputStream(new File(docPath));
                XWPFDocument document = new XWPFDocument(doc);
                PdfOptions options = PdfOptions.create();
                OutputStream out = new FileOutputStream(new File(pdfPath));
                PdfConverter.getInstance().convert(document, out, options);
                System.out.println("Done");
            } catch (FileNotFoundException ex) {
                System.out.println(ex.getMessage());
            } catch (IOException ex) {
    
                System.out.println(ex.getMessage());
            }
        }
    
    }
    

    Exception:

    Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log(ILjava/lang/Object;)V from class org.apache.poi.openxml4j.opc.PackageRelationshipCollection
    at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:313)
    at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:162)
    at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:130)
    at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:559)
    at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:112)
    at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:83)
    at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:128)
    at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
    at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:239)
    at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:665)
    at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
    at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
    at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:121)
    at test.TestCon.ConvertToPDF(TestCon.java:31)
    at test.TestCon.main(TestCon.java:25)
    

    My requirement is to create a java code to convert existing docx into pdf with proper format and alignment.

    Please suggest.

    Jars Used:

    Updated jars

    • Krzysztof Cichocki
      Krzysztof Cichocki about 7 years
      Possible duplicate of How to convert MS doc to pdf
    • Admin
      Admin about 7 years
      @KrzysztofCichocki may be , but i did not get help from that question though.
    • Krzysztof Cichocki
      Krzysztof Cichocki about 7 years
      here is also another answer if you insist on ApachePOI: stackoverflow.com/questions/6201736/…
    • Admin
      Admin about 7 years
      @KrzysztofCichocki i am getting The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
    • Amedee Van Gasse
      Amedee Van Gasse about 7 years
      Your question is not about Apache (the webserver) or about iText, so I removed those tags. I added the more specific tag apache-poi instead. It's also frowned upon to tag someone who is not contributing to a question.
    • Admin
      Admin about 7 years
      @AmedeeVanGasse yeah , but BrunoLowagie is the creator of iText i thought he will help
    • Amedee Van Gasse
      Amedee Van Gasse about 7 years
      Bruno would only be able to help if your question is about iText. But your question is not about iText, it is about Apache POI. Anyway, if you tag someone who has not commented, then they will NOT get notified when you tag them. Stack Overflow does this to prevent tag spamming, which is the description of what you did.
    • VivekRatanSinha
      VivekRatanSinha about 7 years
      No one will be able to help you if you keep changing the goal post. In the last 15 minutes, you have edited the post many times and changed the libraries as many times.
    • Admin
      Admin about 7 years
      @VivekRatanSinha I am trying different possibilities , Can you help in code which will convert docx into pdf with tables.
    • Amedee Van Gasse
      Amedee Van Gasse about 7 years
      In one of your many, many edits, you have com.lowagie in your exception. (I can see the edit history) This means that you are using an ancient version of iText, 2.1.7 or older, that's at least 8 years old. Since you seem to trust on the expertise of Bruno Lowagie, you should be familiar with his opinion about people who still use such an old iText version.
    • mkl
      mkl about 7 years
      java.lang.NoSuchMethodError - if you get such errors, you most likely use a combination of library versions which are not compatible. Look into the respective documentations to discover required dependency versions.
    • VivekRatanSinha
      VivekRatanSinha about 7 years
      @Sam I would appreciate if you mark my answer as Accepted :)
    • Admin
      Admin about 7 years
      @VivekRatanSinha Can you share the code with all libraries used, becuase i am getting java.lang.IllegalAccessError: tried to access method org.apache.poi.util.POILogger.log Error
    • Admin
      Admin about 7 years
      @VivekRatanSinha Thank you sir
  • jmarkmurphy
    jmarkmurphy about 7 years
    Just a note, this is not using POI to do the conversion despite the package names. Only the ooxml-schemas-1.3.jar is from Apache POI. The rest are from the opensagres and itext projects.
  • VivekRatanSinha
    VivekRatanSinha about 7 years
    @jmarkmurphy My approach was not to reinvent the wheel, just to get the code in question to work.
  • jmarkmurphy
    jmarkmurphy about 7 years
    Nothing wrong with your answer, just didn't want anyone to get the wrong idea that the org.apache.poi.xwpf.converter.* packages were a part of Apache POI.
  • Admin
    Admin about 7 years
    Thank you Sir , It helped me
  • Baked Inhalf
    Baked Inhalf over 6 years
    Where did you find the ooxml-schemas-1.3.jar file? I see only newer versions: mvnrepository.com/artifact/org.apache.poi/poi-ooxml
  • VivekRatanSinha
    VivekRatanSinha over 6 years
  • Baked Inhalf
    Baked Inhalf over 6 years
    @VivekRatanSinha I also tried with POI 3.17 and 3.16. None of them worked, so it seems they changed something after version 3.15
  • vickyVick
    vickyVick almost 6 years
    @user1999397 can you provide complete sample because the libs having dependencies on each other
  • Ulphat
    Ulphat about 5 years
    please add pom instead of jars image.
  • emrekgn
    emrekgn about 5 years
    For anyone still trying to convert with XDocReport and POI 4.0.x (w/o itext), new version of XDocReport has a differrent artifact id (fr.opensagres.poi.xwpf.converter.pdf instead of org.apache.poi.xwpf.converter.pdf) and package structure than the older ones. e.g. <dependency> <groupId>fr.opensagres.xdocreport</groupId> <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId‌​> <version>2.0.2</version> </dependency>
  • Serhii Povísenko
    Serhii Povísenko almost 4 years
    issue with documents4j that it requires MS components to be insatlled, hence it will hardly work on Linux for instance. github.com/documents4j/documents4j/issues/41
  • superup
    superup almost 4 years
    I tried follow this get error : java.lang.ClassNotFoundException: org.apache.xmlbeans.XmlException
  • VivekRatanSinha
    VivekRatanSinha almost 4 years
    @superup your classpath does have xmlbeans.jar
  • superup
    superup almost 4 years
    I follow your code and lib get error : No valid entries or contents found, this is not a valid OOXML (Office Open XML) file.
  • skyoxZ
    skyoxZ over 3 years
    Remember to call converter.shutDown(); after execute(); when using Documents4j.
  • skyoxZ
    skyoxZ over 3 years
    Remember to call converter.shutDown(); after execute();
  • Sathiamoorthy
    Sathiamoorthy over 3 years
    @povis, yes you are correct documents4j not working in linux environment. Use docx4j for both windows and linux enviornment stackoverflow.com/questions/3022376/…
  • Sathiamoorthy
    Sathiamoorthy over 3 years
    stackoverflow.com/questions/3022376/… - I hope this is best method convert docx to pdf.
  • amir azizkhani
    amir azizkhani over 3 years
    in rtl languages opensagres have not suitable output, but Documents4j is ok.
  • brebDev
    brebDev over 2 years
    This is the only java opensource library that worked for my docx->pdf conversion. I've tried these 3 libraries:,apache POI, fr.opensagres(which use apache POI for conversion) and docx4j. Thx @Sathia . This was tested on Windows machine.
  • Sathiamoorthy
    Sathiamoorthy over 2 years
    @brebDev, thanks for the appreciation.
  • brebDev
    brebDev over 2 years
    Be AWARE. this is not working in UNIX based system. Unfortunately this worked for me like a charm in Windows local machine, but in Bamboo is not..
  • videomugre
    videomugre about 2 years
    Indeed, works like a charm.