Insert HTML into OpenXML Word Document (.Net)

21,074

Solution 1

Here is another (relatively new) alternative

http://notesforhtml2openxml.codeplex.com/

Solution 2

Well, hard to give general advice, because it depends strongly on your input what is best.

Here's a simple example inserting a paragraph into a DOCX document for each paragraph in an (X)HTML document using OpenXML SDK v2.0 and an XPathDocument:

    void ConvertHTML(string htmlFileName, string docFileName)
    {
        // Create a Wordprocessing document. 
        using (WordprocessingDocument package = WordprocessingDocument.Create(docFileName, WordprocessingDocumentType.Document))
        {
            // Add a new main document part. 
            package.AddMainDocumentPart();

            // Create the Document DOM. 
            package.MainDocumentPart.Document = new Document(new Body());
            Body body = package.MainDocumentPart.Document.Body;

            XPathDocument htmlDoc = new XPathDocument(htmlFileName);

            XPathNavigator navigator = htmlDoc.CreateNavigator();
            XmlNamespaceManager mngr = new XmlNamespaceManager(navigator.NameTable);
            mngr.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");

            XPathNodeIterator ni = navigator.Select("//xhtml:p", mngr);
            while (ni.MoveNext())
            {
                body.AppendChild<Paragraph>(new Paragraph(new Run(new Text(ni.Current.Value))));
            }

            // Save changes to the main document part. 
            package.MainDocumentPart.Document.Save();
        }
    }

The example requires your input to be valid XML, otherwise you will get an exception when creating the XPathDocument.

Please note that this is a very basic example not taking any formatting, headings, lists etc into account.

Solution 3

I'm not sure, what you actually would like to achieve. The OpenXML documents have an own html-like (WordprocessingML) notation for the formatting elements (like paragraph, bold text, etc.). If you would like to add some text to a doc, with basic formatting, than I rather suggest to use the OpenXML syntax and format the inserted text with that.

If you have a html snippet, that you must include into the doc as it is, you can use the "external content" feature of OpenXML. With external content, you can include the HTML document to the package, and create a reference (altChunk) in the doc in the position, where you want to include this. The disadvantage of this solution, that not all tools will support (or support properly) the generated document, therefore I don't recommend this solution, unless you really cannot change the HTML source.

How to include any content (the wordml) to a openxml word doc is an independent question IMHO, and the answer depends very much on how complex modifications you want to apply, and how big the document is. For a simple document, I would simply read out the document part from the package, obtain it's stream and load it to an XmlDocument. You can insert additional content to the XmlDocument quite easily, and then save it back to the package. If the document is big, or you need complex modifications in multiple places, XSLT is a good option.

Share:
21,074
Nico
Author by

Nico

Office365 specialist Enterprise Search and SharePoint expert. .Net / HTML 5 / nodejs &amp; JavaScript / Java Developer

Updated on July 25, 2022

Comments

  • Nico
    Nico almost 2 years

    Using OpenXML SDK, I want to insert basic HTML snippets into a Word document.

    How would you do this:

    • Manipulating XML directly ?
    • Using an XSLT ?
    • using AltChunk ?

    Moreover, C# or VB examples are more than welcome :)

  • Nico
    Nico over 15 years
    You're right, but I'm looking for return of experience. So far, I've implemented altChunk but it only works if you have Word2007, not the compatibility package.