iTextSharp - Convert word doc/docx to pdf

88,921

Solution 1

The Aspose.Words component can do this reliably (I'm not affiliated or anything).

iTextSharp does not have the required feature set to load and process MS Word file formats.

Solution 2

You can use existing method of Microsoft.Office

 private Microsoft.Office.Interop.Word.ApplicationClass MSdoc;

    //Use for the parameter whose type are not known or say Missing
    object Unknown = Type.Missing;

  private void word2PDF(object Source, object Target)
    {   //Creating the instance of Word Application
      if (MSdoc == null)MSdoc = new Microsoft.Office.Interop.Word.ApplicationClass();

        try
        {
            MSdoc.Visible = false;
            MSdoc.Documents.Open(ref Source, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown,
                 ref Unknown, ref Unknown, ref Unknown, ref Unknown, ref Unknown);
             MSdoc.Application.Visible = false;
              MSdoc.WindowState =   Microsoft.Office.Interop.Word.WdWindowState.wdWindowStateMinimize;

            object format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;

            MSdoc.ActiveDocument.SaveAs(ref Target, ref format,
                    ref Unknown, ref Unknown, ref Unknown,
                    ref Unknown, ref Unknown, ref Unknown,
                    ref Unknown, ref Unknown, ref Unknown,
                    ref Unknown, ref Unknown, ref Unknown,
                   ref Unknown, ref Unknown);
          }
           catch (Exception e)
          {
            MessageBox.Show(e.Message);
           }
         finally
          {
            if (MSdoc != null)
            {
                MSdoc.Documents.Close(ref Unknown, ref Unknown, ref Unknown);
                //WordDoc.Application.Quit(ref Unknown, ref Unknown, ref Unknown);
            }
            // for closing the application
            WordDoc.Quit(ref Unknown, ref Unknown, ref Unknown);
        }
    } 

Solution 3

Aspose.Words is indeed a good solution, but it doesn't offer perfect fidelity. At the time of writing it has problems with non Roman languages, complex formatting such as floating elements and a number of other problems.

You may want to have a look at this PDF Conversion Web Service that can be used from any Web Services capable environment including Java and .NET.

Note that I worked on this project so the usual disclaimers apply.

Solution 4

If you do not care about whether the formatting will be faithful to what Word would display, there is the impressive docx2tex which converts Word 2007 docx files to Latex documents. Once in Latex, you have a lot of power to programmitically reformat the document, and generate PDF from it.

I say more about the utility in an answer at tex.stackexchange.  

Solution 5

I do have the same issue.
After several days of trying to find a solution, it seems Docx4J , a Java-based tool, or PDF printers like PDFCreator, could be among the free solution.
For sure, just a commercial tool can effectively do the task requested.
On the Microsoft side, you could use server-side enabled Sharepoint Word Automation Services, ( check on 7 June 2016 ), or interop in your local computer.
The suggested part-to-part conversion ( DOC or DOC to some intermediate language and then to PDF ) it seems for, what users had said on stackoverflow or others forums, not possible, because result is not what expected.

Share:
88,921
inutan
Author by

inutan

Software developer with experience in MVC, C# and SQL Server.

Updated on July 22, 2022

Comments

  • inutan
    inutan almost 2 years

    I understand iTextSharp can be used for converting a document to pdf.

    But first we have to create a document from scratch using iTextSharp.text.Document and then adding elements to this document.

    What if I have an existing doc file, is it possible to convert this document to pdf using iTextSharp.

    Also, I want to use iTextSharp or any similar tool which can perform following on a doc file:

    1. manipulation of doc/docx/text files (like replacing some placeholders with DB values) as well as
    2. converts them to .pdf

    Anyone having idea about this, please share.

    Thank you!