Append multiple DOCX files together

30,871

Solution 1

In spite of all good suggestions and solutions submitted, I developed an alternative. In my opinion you should avoid using Word in server applications entirely. So I worked with OpenXML, but it did not work with AltChunk. I added text to original body, I receive a List of byte[] instead a List of file names but you can easily change the code to your needs.

using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace OfficeMergeControl
{
    public class CombineDocs
    {
        public byte[] OpenAndCombine( IList<byte[]> documents )
        {
            MemoryStream mainStream = new MemoryStream();

            mainStream.Write(documents[0], 0, documents[0].Length);
            mainStream.Position = 0;

            int pointer = 1;
            byte[] ret;
            try
            {
                using (WordprocessingDocument mainDocument = WordprocessingDocument.Open(mainStream, true))
                {

                    XElement newBody = XElement.Parse(mainDocument.MainDocumentPart.Document.Body.OuterXml);

                    for (pointer = 1; pointer < documents.Count; pointer++)
                    {
                        WordprocessingDocument tempDocument = WordprocessingDocument.Open(new MemoryStream(documents[pointer]), true);
                        XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml);

                        newBody.Add(tempBody);
                        mainDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString());
                        mainDocument.MainDocumentPart.Document.Save();
                        mainDocument.Package.Flush();
                    }
                }
            }
            catch (OpenXmlPackageException oxmle)
            {
                throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), oxmle);
            }
            catch (Exception e)
            {
                throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), e);
            }
            finally
            {
                ret = mainStream.ToArray();
                mainStream.Close();
                mainStream.Dispose();
            }
            return (ret);
        }
    }
}

I hope this helps you.

Solution 2

You don't need to use automation. DOCX files are based on the OpenXML Formats. They are just zip files with a bunch of XML and binary parts (think files) inside. You can open them with the Packaging API (System.IO.Packaging in WindowsBase.dll) and manipulate them with any of the XML classes in the Framework.

Check out OpenXMLDeveloper.org for details.

Solution 3

This is a very late to the original question and quite a bit has change but thought I would share the way I have written my merge logic. This makes use of the Open XML Power Tools

public byte[] CreateDocument(IList<byte[]> documentsToMerge)
{
    List<Source> documentBuilderSources = new List<Source>();
    foreach (byte[] documentByteArray in documentsToMerge)
    {
        documentBuilderSources.Add(new Source(new WmlDocument(string.Empty, documentByteArray), false));
    }

    WmlDocument mergedDocument = DocumentBuilder.BuildDocument(documentBuilderSources);
    return mergedDocument.DocumentByteArray;
}

Currently this is working very well in our application. I have changed the code a little because my requirements is that each document that needs to be processed first. So what gets passed in is a DTO object with the template byte array and the various values that need to be replaced. Here is how my code currently looks. Which takes the code a little bit further.

public byte[] CreateDocument(IList<DocumentSection> documentTemplates)
{
    List<Source> documentBuilderSources = new List<Source>();
    foreach (DocumentSection documentTemplate in documentTemplates.OrderBy(dt => dt.Rank))
    {
        // Take the template replace the items and then push it into the chunk
        using (MemoryStream templateStream = new MemoryStream())
        {
            templateStream.Write(documentTemplate.Template, 0, documentTemplate.Template.Length);

            this.ProcessOpenXMLDocument(templateStream, documentTemplate.Fields);

            documentBuilderSources.Add(new Source(new WmlDocument(string.Empty, templateStream.ToArray()), false));
        }
    }

    WmlDocument mergedDocument = DocumentBuilder.BuildDocument(documentBuilderSources);
    return mergedDocument.DocumentByteArray;
}

Solution 4

I wrote a little test app a while ago to do this. My test app worked with Word 2003 documents (.doc) not .docx, but I imagine the process is the same - I should think all you'd have to change is to use a newer version of the Primary Interop Assembly. This code would look a lot neater with the new C# 4.0 features...

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using Microsoft.Office.Interop.Word;
using Microsoft.Office.Core;
using System.Runtime.InteropServices;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            new Program().Start();
        }

        private void Start()
        {
            object fileName = Path.Combine(Environment.CurrentDirectory, @"NewDocument.doc");
            File.Delete(fileName.ToString());

            try
            {
                WordApplication = new ApplicationClass();
                var doc = WordApplication.Documents.Add(ref missing, ref missing, ref missing, ref missing);
                try
                {
                    doc.Activate();

                    AddDocument(@"D:\Projects\WordTests\ConsoleApplication1\Documents\Doc1.doc", doc, false);
                    AddDocument(@"D:\Projects\WordTests\ConsoleApplication1\Documents\Doc2.doc", doc, true);

                    doc.SaveAs(ref fileName,
                        ref missing, ref missing, ref missing, ref missing,     ref missing,
                        ref missing, ref missing, ref missing, ref missing, ref missing,
                        ref missing, ref missing, ref missing, ref missing, ref missing);
                }
                finally
                {
                    doc.Close(ref missing, ref missing, ref missing);
                }
            }
            finally
            {
                WordApplication.Quit(ref missing, ref missing, ref missing);
            }
        }

        private void AddDocument(string path, Document doc, bool lastDocument)
        {
            object subDocPath = path;
            var subDoc = WordApplication.Documents.Open(ref subDocPath, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing);
            try
            {

                object docStart = doc.Content.End - 1;
                object docEnd = doc.Content.End;

                object start = subDoc.Content.Start;
                object end = subDoc.Content.End;

                Range rng = doc.Range(ref docStart, ref docEnd);
                rng.FormattedText = subDoc.Range(ref start, ref end);

                if (!lastDocument)
                {
                    InsertPageBreak(doc);
                }
            }
            finally
            {
                subDoc.Close(ref missing, ref missing, ref missing);
            }
        }

        private static void InsertPageBreak(Document doc)
        {
            object docStart = doc.Content.End - 1;
            object docEnd = doc.Content.End;
            Range rng = doc.Range(ref docStart, ref docEnd);

            object pageBreak = WdBreakType.wdPageBreak;
            rng.InsertBreak(ref pageBreak);
        }

        private ApplicationClass WordApplication { get; set; }

        private object missing = Type.Missing;
    }
}

Solution 5

You want to use AltChunks and the OpenXml SDK 1.0 (at a minimum, 2.0 if you can). Check out Eric White's blog for more details and just as a great resource!. Here is a code sample that should get you started, if not work immediately.

public void AddAltChunkPart(Stream parentStream, Stream altStream, string altChunkId)
{
    //make sure we are at the start of the stream    
    parentStream.Position = 0;
    altStream.Position = 0;
    //push the parentStream into a WordProcessing Document
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(parentStream, true))
    {
        //get the main document part
        MainDocumentPart mainPart = wordDoc.MainDocumentPart;
        //create an altChunk part by adding a part to the main document part
        AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(altChunkPartType, altChunkId);
        //feed the altChunk stream into the chunk part
        chunk.FeedData(altStream);
        //create and XElement to represent the new chunk in the document
        XElement newChunk = new XElement(altChunk, new XAttribute(relId, altChunkId));
        //Add the chunk to the end of the document (search to last paragraph in body and add at the end)
        wordDoc.MainDocumentPart.GetXDocument().Root.Element(body).Elements(paragraph).Last().AddAfterSelf(newChunk);
        //Finally, save the document
        wordDoc.MainDocumentPart.PutXDocument();
    }
    //reset position of parent stream
    parentStream.Position = 0;
}
Share:
30,871
ShootTheCore
Author by

ShootTheCore

Updated on March 05, 2020

Comments

  • ShootTheCore
    ShootTheCore about 4 years

    I need to use C# programatically to append several preexisting docx files into a single, long docx file - including special markups like bullets and images. Header and footer information will be stripped out, so those won't be around to cause any problems.

    I can find plenty of information about manipulating an individual docx file with .NET Framework 3, but nothing easy or obvious about how you would merge files. There is also a third-party program (Acronis.Words) that will do it, but it is prohibitively expensive.

    Update:

    Automating through Word has been suggested, but my code is going to be running on ASP.NET on an IIS web server, so going out to Word is not an option for me. Sorry for not mentioning that in the first place.

  • Dave Markle
    Dave Markle over 15 years
    Automation is from Satan. Good answer, Rob.
  • MadBoy
    MadBoy about 14 years
    Does it also adds pagebreak ?
  • rohitwtbs
    rohitwtbs about 14 years
    Hi MadBoy, I checked and it preserve original page breaks and add new page breaks when needed.
  • MadBoy
    MadBoy over 13 years
    Works great, but it seems to miss header and footers. Do you know a way to make it merge all headers and footers as well?
  • MadBoy
    MadBoy over 13 years
    Would this work with images, and headers and footers (different headers, footers, images across each document)?
  • JasonPlutext
    JasonPlutext over 13 years
    If you use w:altchunk, then you need to open the document in something (eg Word 2007) which is capable of converting the altChunk element to regular document content.
  • JasonPlutext
    JasonPlutext over 13 years
    @MadBoy: it wouldn't. For that, you can use the OpenXML PowerTools from CodePlex. I've written equivalent code in Java, on top of docx4j.
  • Alconja
    Alconja about 13 years
    @MadBoy - It won't explicitly create new page breaks between merged documents, it will just flow over new pages as necessary. However, you can add page breaks between documents explicitly by doing the following (before first line inside for loop): newBody.Add(XElement.Parse(new Paragraph(new Run(new Break { Type = BreakValues.Page })).OuterXml));
  • Ali
    Ali over 11 years
    @GRGodoi can you please tell me or give me the code to do the same for PowerPoint?
  • Ingó Vals
    Ingó Vals almost 11 years
    Is it necessary to save and flush in each iteration instead of just compiling the bodies and saving and flushing in the end?
  • JasonPlutext
    JasonPlutext almost 11 years
    Correct, however for all but trivial documents there is a lot to do to ensure document integrity. The best free solution is described at openxmldeveloper.org/wiki/w/wiki/documentbuilder.aspx but if you need more then you could try online our MergeDocx.NET commercial product at plutext.com/m/index.php/products