How to merge multiple pdf files (generated in run time)?

87,846

Solution 1

If you want to merge source documents using iText(Sharp), there are two basic situations:

  1. You really want to merge the documents, acquiring the pages in their original format, transfering as much of their content and their interactive annotations as possible. In this case you should use a solution based on a member of the Pdf*Copy* family of classes.

  2. You actually want to integrate pages from the source documents into a new document but want the new document to govern the general format and don't care for the interactive features (annotations...) in the original documents (or even want to get rid of them). In this case you should use a solution based on the PdfWriter class.

You can find details in chapter 6 (especially section 6.4) of iText in Action — 2nd Edition. The Java sample code can be accessed here and the C#'ified versions here.

A simple sample using PdfCopy is Concatenate.java / Concatenate.cs. The central piece of code is:

byte[] mergedPdf = null;
using (MemoryStream ms = new MemoryStream())
{
    using (Document document = new Document())
    {
        using (PdfCopy copy = new PdfCopy(document, ms))
        {
            document.Open();

            for (int i = 0; i < pdf.Count; ++i)
            {
                PdfReader reader = new PdfReader(pdf[i]);
                // loop over the pages in that document
                int n = reader.NumberOfPages;
                for (int page = 0; page < n; )
                {
                    copy.AddPage(copy.GetImportedPage(reader, ++page));
                }
            }
        }
    }
    mergedPdf = ms.ToArray();
}

Here pdf can either be defined as a List<byte[]> immediately containing the source documents (appropriate for your use case of merging intermediate in-memory documents) or as a List<String> containing the names of source document files (appropriate if you merge documents from disk).

An overview at the end of the referenced chapter summarizes the usage of the classes mentioned:

  • PdfCopy: Copies pages from one or more existing PDF documents. Major downsides: PdfCopy doesn’t detect redundant content, and it fails when concatenating forms.

  • PdfCopyFields: Puts the fields of the different forms into one form. Can be used to avoid the problems encountered with form fields when concatenating forms using PdfCopy. Memory use can be an issue.

  • PdfSmartCopy: Copies pages from one or more existing PDF documents. PdfSmartCopy is able to detect redundant content, but it needs more memory and CPU than PdfCopy.

  • PdfWriter: Generates PDF documents from scratch. Can import pages from other PDF documents. The major downside is that all interactive features of the imported page (annotations, bookmarks, fields, and so forth) are lost in the process.

Solution 2

I used iTextsharp with c# to combine pdf files. This is the code I used.

string[] lstFiles=new string[3];
    lstFiles[0]=@"C:/pdf/1.pdf";
    lstFiles[1]=@"C:/pdf/2.pdf";
    lstFiles[2]=@"C:/pdf/3.pdf";

    PdfReader reader = null;
    Document sourceDocument = null;
    PdfCopy pdfCopyProvider = null;
    PdfImportedPage importedPage;
    string outputPdfPath=@"C:/pdf/new.pdf";


    sourceDocument = new Document();
    pdfCopyProvider = new PdfCopy(sourceDocument, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));

    //Open the output file
    sourceDocument.Open();

    try
    {
        //Loop through the files list
        for (int f = 0; f < lstFiles.Length-1; f++)
        {
            int pages =get_pageCcount(lstFiles[f]);

            reader = new PdfReader(lstFiles[f]);
            //Add pages of current file
            for (int i = 1; i <= pages; i++)
            {
                importedPage = pdfCopyProvider.GetImportedPage(reader, i);
                pdfCopyProvider.AddPage(importedPage);
            }

            reader.Close();
         }
        //At the end save the output file
        sourceDocument.Close();
    }
    catch (Exception ex)
    {
        throw ex;
    }


private int get_pageCcount(string file)
{
    using (StreamReader sr = new StreamReader(File.OpenRead(file)))
    {
        Regex regex = new Regex(@"/Type\s*/Page[^s]");
        MatchCollection matches = regex.Matches(sr.ReadToEnd());

        return matches.Count;
    }
}

Solution 3

Here is some code I pulled out of an old project I had. It was a web application but I was using iTextSharp to merge pdf files then print them.

public static class PdfMerger
    {
        /// <summary>
        /// Merge pdf files.
        /// </summary>
        /// <param name="sourceFiles">PDF files being merged.</param>
        /// <returns></returns>
        public static byte[] MergeFiles(List<Stream> sourceFiles)
        {
            Document document = new Document();
            MemoryStream output = new MemoryStream();

            try
            {
                // Initialize pdf writer
                PdfWriter writer = PdfWriter.GetInstance(document, output);
                writer.PageEvent = new PdfPageEvents();

                // Open document to write
                document.Open();
                PdfContentByte content = writer.DirectContent;

                // Iterate through all pdf documents
                for (int fileCounter = 0; fileCounter < sourceFiles.Count; fileCounter++)
                {
                    // Create pdf reader
                    PdfReader reader = new PdfReader(sourceFiles[fileCounter]);
                    int numberOfPages = reader.NumberOfPages;

                    // Iterate through all pages
                    for (int currentPageIndex = 1; currentPageIndex <=
                                        numberOfPages; currentPageIndex++)
                    {
                        // Determine page size for the current page
                        document.SetPageSize(
                            reader.GetPageSizeWithRotation(currentPageIndex));

                        // Create page
                        document.NewPage();
                        PdfImportedPage importedPage =
                            writer.GetImportedPage(reader, currentPageIndex);


                        // Determine page orientation
                        int pageOrientation = reader.GetPageRotation(currentPageIndex);
                        if ((pageOrientation == 90) || (pageOrientation == 270))
                        {
                            content.AddTemplate(importedPage, 0, -1f, 1f, 0, 0,
                                reader.GetPageSizeWithRotation(currentPageIndex).Height);
                        }
                        else
                        {
                            content.AddTemplate(importedPage, 1f, 0, 0, 1f, 0, 0);
                        }
                    }
                }
            }
            catch (Exception exception)
            {
                throw new Exception("There has an unexpected exception" +
                        " occured during the pdf merging process.", exception);
            }
            finally
            {
                document.Close();
            }
            return output.GetBuffer();
        }
    }



    /// <summary>
    /// Implements custom page events.
    /// </summary>
    internal class PdfPageEvents : IPdfPageEvent
    {
        #region members
        private BaseFont _baseFont = null;
        private PdfContentByte _content;
        #endregion

        #region IPdfPageEvent Members
        public void OnOpenDocument(PdfWriter writer, Document document)
        {
            _baseFont = BaseFont.CreateFont(BaseFont.HELVETICA,
                                BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
            _content = writer.DirectContent;
        }

        public void OnStartPage(PdfWriter writer, Document document)
        { }

        public void OnEndPage(PdfWriter writer, Document document)
        { }

        public void OnCloseDocument(PdfWriter writer, Document document)
        { }

        public void OnParagraph(PdfWriter writer,
                    Document document, float paragraphPosition)
        { }

        public void OnParagraphEnd(PdfWriter writer,
                    Document document, float paragraphPosition)
        { }

        public void OnChapter(PdfWriter writer, Document document,
                                float paragraphPosition, Paragraph title)
        { }

        public void OnChapterEnd(PdfWriter writer,
                    Document document, float paragraphPosition)
        { }

        public void OnSection(PdfWriter writer, Document document,
                    float paragraphPosition, int depth, Paragraph title)
        { }

        public void OnSectionEnd(PdfWriter writer,
                    Document document, float paragraphPosition)
        { }

        public void OnGenericTag(PdfWriter writer, Document document,
                                    Rectangle rect, string text)
        { }
        #endregion

        private float GetCenterTextPosition(string text, PdfWriter writer)
        {
            return writer.PageSize.Width / 2 - _baseFont.GetWidthPoint(text, 8) / 2;
        }
    }

I didn't write this, but made some modifications. I can't remember where I found it. After I merged the PDFs I would call this method to insert javascript to open the print dialog when the PDF is opened. If you change bSilent to true then it should print silently to their default printer.

public Stream addPrintJStoPDF(Stream thePDF)
{
    MemoryStream outPutStream = null;
    PRStream finalStream = null;
    PdfDictionary page = null;
    string content = null;

    //Open the stream with iTextSharp
    var reader = new PdfReader(thePDF);

    outPutStream = new MemoryStream(finalStream.GetBytes());
    var stamper = new PdfStamper(reader, (MemoryStream)outPutStream);
    var jsText = "var res = app.setTimeOut('this.print({bUI: true, bSilent: false, bShrinkToFit: false});', 200);";
    //Add the javascript to the PDF
    stamper.JavaScript = jsText;

    stamper.FormFlattening = true;
    stamper.Writer.CloseStream = false;
    stamper.Close();

    //Set the stream to the beginning
    outPutStream.Position = 0;

    return outPutStream;
}

Not sure how well the above code is written since I pulled it from somewhere else and I haven't worked in depth at all with iTextSharp but I do know that it did work at merging PDFs that I was generating at runtime.

Solution 4

Tested with iTextSharp-LGPL 4.1.6:

    public static byte[] ConcatenatePdfs(IEnumerable<byte[]> documents)
    {
        using (var ms = new MemoryStream())
        {
            var outputDocument = new Document();
            var writer = new PdfCopy(outputDocument, ms);
            outputDocument.Open();

            foreach (var doc in documents)
            {
                var reader = new PdfReader(doc);
                for (var i = 1; i <= reader.NumberOfPages; i++)
                {
                    writer.AddPage(writer.GetImportedPage(reader, i));
                }
                writer.FreeReader(reader);
                reader.Close();
            }

            writer.Close();
            outputDocument.Close();
            var allPagesContent = ms.GetBuffer();
            ms.Flush();

            return allPagesContent;
        }
    }

Solution 5

To avoid the memory issues mentioned, I used file stream instead of memory stream(mentioned in ITextSharp Out of memory exception merging multiple pdf) to merge pdf files:

        var parentDirectory = Directory.GetParent(SelectedDocuments[0].FilePath);
        var savePath = parentDirectory + "\\MergedDocument.pdf";

        using (var fs = new FileStream(savePath, FileMode.Create))
        {
            using (var document = new Document())
            {
                using (var pdfCopy = new PdfCopy(document, fs))
                {
                    document.Open();
                    for (var i = 0; i < SelectedDocuments.Count; i++)
                    {
                        using (var pdfReader = new PdfReader(SelectedDocuments[i].FilePath))
                        {
                            for (var page = 0; page < pdfReader.NumberOfPages;)
                            {
                                pdfCopy.AddPage(pdfCopy.GetImportedPage(pdfReader, ++page));
                            }
                        }
                    }
                }
            }
        }
Share:
87,846
Anyname Donotcare
Author by

Anyname Donotcare

Updated on July 05, 2022

Comments

  • Anyname Donotcare
    Anyname Donotcare almost 2 years

    How to merge multiple pdf files (generated on run time) through ItextSharp then printing them.

    I found the following link but that method requires the pdf names considering that the pdf files stored and this is not my case .


    I have multiple reports i'll convert them to pdf files through this method :

    private void AddReportToResponse(LocalReport followsReport)
    {
        string mimeType;
        string encoding;
        string extension;
        string[] streams = new string[100];
        Warning[] warnings = new Warning[100];
        byte[] pdfStream = followsReport.Render("PDF", "", out mimeType, out encoding, out extension, out streams, out warnings);
      //Response.Clear();
      //Response.ContentType = mimeType;
      //Response.AddHeader("content-disposition", "attachment; filename=Application." + extension);
      //Response.BinaryWrite(pdfStream);
      //Response.End();
    }
    

    Now i want to merge all those generated files (Bytes) in one pdf file to print it

  • mkl
    mkl about 11 years
    Please refrain from using this kind of merge routine unless you have very specific requirements forcing you to do that. When you use PdfWriter to merge source PDFs, interactive features (forms and other annotations) are lost. Furthermore the resulting PDF internally contains an unnecessary wrapper around the page information which when iterated multiple times may cause PDF viewers to fail when trying to display the PDF.
  • DSlagle
    DSlagle about 11 years
    As I said I had pulled this from older code that was in production but PDFs were generated from html built by a wysiwyg editor so we had no interactive features. Also our iterations were usually only around 10 at a time and we never had issues with the pdf not opening. I posted this as an example as we had it running in production and I know that it was working to merge pdfs with no reported issues.
  • mkl
    mkl about 11 years
    I intended no offense; such merging solutions like yours based on PdfWriter indeed can be found more often than the ones using the better suited classes when googling around, and they do work after a fashion, so they are not completely wrong. Pdf*Copy* based solutions, though, generally are easier to use (no need to adapt the target document page size and rotation again and again), more complete (concerning interactive features), and produce cleaner output (with respect to the internal PDF structure).
  • mkl
    mkl over 10 years
    It really is interesting to see that someone downvoted this answer without leaving a comment explaining deficiencies... That been said, iText has developed meanwhile and much of the PdfCopyFields specific stuff has found its way into PdfCopy.
  • mkl
    mkl over 9 years
    @BonusKun It mergers really slow - if it is exceptionally slow, you might want to create a question in its own right providing sample documents to reproduce the problem... (I upvoted btw.) - thanx!
  • Lill Lansey
    Lill Lansey almost 9 years
    If you want to save the file to disk, after line "mergedPdf = ms.ToArray();", use: System.IO.File.WriteAllBytes(@"C:\MyFileName.pdf", mergedPdf);
  • 10K35H 5H4KY4
    10K35H 5H4KY4 almost 9 years
    What if you want as download stream? return File(ms, "application/pdf", "file.pdf") is not working for me
  • mkl
    mkl almost 9 years
    @NearlyCrazy That essentially constututes a new question, How to return the contents of a MemoryStream as a download stream; please make that a question in its own right. I'm afraid I cannot help here as I'm not into .Net web service stuff.
  • Vikky
    Vikky over 8 years
    i am getting an error "An item with the same key has already been added." at below line " copy.AddPage(copy.GetImportedPage(reader, System.Threading.Interlocked.Increment(page)))". Can anyone help please]
  • mkl
    mkl over 8 years
    @Vikky that line is not in the code above. Thus, your question is about a different situation than explicitly explained here. To get help, therefore, you should make that an actual so question and supply the required information.
  • Vikky
    Vikky over 8 years
    i converted this code to vb, i googled many times and not getting any ref. Please help if you can
  • mkl
    mkl over 8 years
    You use System.Threading.Interlocked.Increment(page). This seems to indicate that you access and change that page variable from multiple threads, and therefore the copying classes, too. The classes above are not inherently thread-safe. So you have to take special care. Thus, please describe your issue in detail in a question in its own right.
  • Vikky
    Vikky over 8 years
    hi @mkl, i have posted in detail here stackoverflow.com/questions/33043151/…
  • MikeTeeVee
    MikeTeeVee almost 8 years
    You are missing document.Close(); Without this, you may see errors when opening the merged pdf file, like "There was an error opening this document. The file is damaged and could not be repaired.". For my purposes, I needed to return a Stream object, so I declared this variable above the Usings: Stream stream = null; then near the end of my Usings (immediately after calling document.Close();), I added this stream = new MemoryStream(ms.ToArray());
  • mkl
    mkl almost 8 years
    @MikeTeeVee You are missing document.Close() - No, using (Document document = new Document()) implicitly closes the document. If you need to grab the memory stream contents before the closing bracket of the using, though, you indeed need to explicitly close.
  • Dragonthoughts
    Dragonthoughts over 5 years
    It would be more helpful if you could explain how this answers the question.
  • mkl
    mkl over 5 years
    Please refrain from using this kind of merge routine unless you have very specific requirements forcing you to do that. When you use PdfWriter to merge source PDFs, interactive features (forms and other annotations) are lost. Furthermore the resulting PDF internally contains an unnecessary wrapper around the page information which when iterated multiple times may cause PDF viewers to fail when trying to display the PDF.