How to merge multiple pdf files (generated in run time)?
Solution 1
If you want to merge source documents using iText(Sharp), there are two basic situations:
You really want to merge the documents, acquiring the pages in their original format, transfering as much of their content and their interactive annotations as possible. In this case you should use a solution based on a member of the
Pdf*Copy*
family of classes.You actually want to integrate pages from the source documents into a new document but want the new document to govern the general format and don't care for the interactive features (annotations...) in the original documents (or even want to get rid of them). In this case you should use a solution based on the
PdfWriter
class.
You can find details in chapter 6 (especially section 6.4) of iText in Action — 2nd Edition. The Java sample code can be accessed here and the C#'ified versions here.
A simple sample using PdfCopy
is Concatenate.java / Concatenate.cs. The central piece of code is:
byte[] mergedPdf = null;
using (MemoryStream ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
for (int i = 0; i < pdf.Count; ++i)
{
PdfReader reader = new PdfReader(pdf[i]);
// loop over the pages in that document
int n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
}
}
}
mergedPdf = ms.ToArray();
}
Here pdf
can either be defined as a List<byte[]>
immediately containing the source documents (appropriate for your use case of merging intermediate in-memory documents) or as a List<String>
containing the names of source document files (appropriate if you merge documents from disk).
An overview at the end of the referenced chapter summarizes the usage of the classes mentioned:
PdfCopy
: Copies pages from one or more existing PDF documents. Major downsides:PdfCopy
doesn’t detect redundant content, and it fails when concatenating forms.PdfCopyFields
: Puts the fields of the different forms into one form. Can be used to avoid the problems encountered with form fields when concatenating forms usingPdfCopy
. Memory use can be an issue.PdfSmartCopy
: Copies pages from one or more existing PDF documents.PdfSmartCopy
is able to detect redundant content, but it needs more memory and CPU thanPdfCopy
.PdfWriter
: Generates PDF documents from scratch. Can import pages from other PDF documents. The major downside is that all interactive features of the imported page (annotations, bookmarks, fields, and so forth) are lost in the process.
Solution 2
I used iTextsharp with c# to combine pdf files. This is the code I used.
string[] lstFiles=new string[3];
lstFiles[0]=@"C:/pdf/1.pdf";
lstFiles[1]=@"C:/pdf/2.pdf";
lstFiles[2]=@"C:/pdf/3.pdf";
PdfReader reader = null;
Document sourceDocument = null;
PdfCopy pdfCopyProvider = null;
PdfImportedPage importedPage;
string outputPdfPath=@"C:/pdf/new.pdf";
sourceDocument = new Document();
pdfCopyProvider = new PdfCopy(sourceDocument, new System.IO.FileStream(outputPdfPath, System.IO.FileMode.Create));
//Open the output file
sourceDocument.Open();
try
{
//Loop through the files list
for (int f = 0; f < lstFiles.Length-1; f++)
{
int pages =get_pageCcount(lstFiles[f]);
reader = new PdfReader(lstFiles[f]);
//Add pages of current file
for (int i = 1; i <= pages; i++)
{
importedPage = pdfCopyProvider.GetImportedPage(reader, i);
pdfCopyProvider.AddPage(importedPage);
}
reader.Close();
}
//At the end save the output file
sourceDocument.Close();
}
catch (Exception ex)
{
throw ex;
}
private int get_pageCcount(string file)
{
using (StreamReader sr = new StreamReader(File.OpenRead(file)))
{
Regex regex = new Regex(@"/Type\s*/Page[^s]");
MatchCollection matches = regex.Matches(sr.ReadToEnd());
return matches.Count;
}
}
Solution 3
Here is some code I pulled out of an old project I had. It was a web application but I was using iTextSharp to merge pdf files then print them.
public static class PdfMerger
{
/// <summary>
/// Merge pdf files.
/// </summary>
/// <param name="sourceFiles">PDF files being merged.</param>
/// <returns></returns>
public static byte[] MergeFiles(List<Stream> sourceFiles)
{
Document document = new Document();
MemoryStream output = new MemoryStream();
try
{
// Initialize pdf writer
PdfWriter writer = PdfWriter.GetInstance(document, output);
writer.PageEvent = new PdfPageEvents();
// Open document to write
document.Open();
PdfContentByte content = writer.DirectContent;
// Iterate through all pdf documents
for (int fileCounter = 0; fileCounter < sourceFiles.Count; fileCounter++)
{
// Create pdf reader
PdfReader reader = new PdfReader(sourceFiles[fileCounter]);
int numberOfPages = reader.NumberOfPages;
// Iterate through all pages
for (int currentPageIndex = 1; currentPageIndex <=
numberOfPages; currentPageIndex++)
{
// Determine page size for the current page
document.SetPageSize(
reader.GetPageSizeWithRotation(currentPageIndex));
// Create page
document.NewPage();
PdfImportedPage importedPage =
writer.GetImportedPage(reader, currentPageIndex);
// Determine page orientation
int pageOrientation = reader.GetPageRotation(currentPageIndex);
if ((pageOrientation == 90) || (pageOrientation == 270))
{
content.AddTemplate(importedPage, 0, -1f, 1f, 0, 0,
reader.GetPageSizeWithRotation(currentPageIndex).Height);
}
else
{
content.AddTemplate(importedPage, 1f, 0, 0, 1f, 0, 0);
}
}
}
}
catch (Exception exception)
{
throw new Exception("There has an unexpected exception" +
" occured during the pdf merging process.", exception);
}
finally
{
document.Close();
}
return output.GetBuffer();
}
}
/// <summary>
/// Implements custom page events.
/// </summary>
internal class PdfPageEvents : IPdfPageEvent
{
#region members
private BaseFont _baseFont = null;
private PdfContentByte _content;
#endregion
#region IPdfPageEvent Members
public void OnOpenDocument(PdfWriter writer, Document document)
{
_baseFont = BaseFont.CreateFont(BaseFont.HELVETICA,
BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
_content = writer.DirectContent;
}
public void OnStartPage(PdfWriter writer, Document document)
{ }
public void OnEndPage(PdfWriter writer, Document document)
{ }
public void OnCloseDocument(PdfWriter writer, Document document)
{ }
public void OnParagraph(PdfWriter writer,
Document document, float paragraphPosition)
{ }
public void OnParagraphEnd(PdfWriter writer,
Document document, float paragraphPosition)
{ }
public void OnChapter(PdfWriter writer, Document document,
float paragraphPosition, Paragraph title)
{ }
public void OnChapterEnd(PdfWriter writer,
Document document, float paragraphPosition)
{ }
public void OnSection(PdfWriter writer, Document document,
float paragraphPosition, int depth, Paragraph title)
{ }
public void OnSectionEnd(PdfWriter writer,
Document document, float paragraphPosition)
{ }
public void OnGenericTag(PdfWriter writer, Document document,
Rectangle rect, string text)
{ }
#endregion
private float GetCenterTextPosition(string text, PdfWriter writer)
{
return writer.PageSize.Width / 2 - _baseFont.GetWidthPoint(text, 8) / 2;
}
}
I didn't write this, but made some modifications. I can't remember where I found it. After I merged the PDFs I would call this method to insert javascript to open the print dialog when the PDF is opened. If you change bSilent to true then it should print silently to their default printer.
public Stream addPrintJStoPDF(Stream thePDF)
{
MemoryStream outPutStream = null;
PRStream finalStream = null;
PdfDictionary page = null;
string content = null;
//Open the stream with iTextSharp
var reader = new PdfReader(thePDF);
outPutStream = new MemoryStream(finalStream.GetBytes());
var stamper = new PdfStamper(reader, (MemoryStream)outPutStream);
var jsText = "var res = app.setTimeOut('this.print({bUI: true, bSilent: false, bShrinkToFit: false});', 200);";
//Add the javascript to the PDF
stamper.JavaScript = jsText;
stamper.FormFlattening = true;
stamper.Writer.CloseStream = false;
stamper.Close();
//Set the stream to the beginning
outPutStream.Position = 0;
return outPutStream;
}
Not sure how well the above code is written since I pulled it from somewhere else and I haven't worked in depth at all with iTextSharp but I do know that it did work at merging PDFs that I was generating at runtime.
Solution 4
Tested with iTextSharp-LGPL 4.1.6:
public static byte[] ConcatenatePdfs(IEnumerable<byte[]> documents)
{
using (var ms = new MemoryStream())
{
var outputDocument = new Document();
var writer = new PdfCopy(outputDocument, ms);
outputDocument.Open();
foreach (var doc in documents)
{
var reader = new PdfReader(doc);
for (var i = 1; i <= reader.NumberOfPages; i++)
{
writer.AddPage(writer.GetImportedPage(reader, i));
}
writer.FreeReader(reader);
reader.Close();
}
writer.Close();
outputDocument.Close();
var allPagesContent = ms.GetBuffer();
ms.Flush();
return allPagesContent;
}
}
Solution 5
To avoid the memory issues mentioned, I used file stream instead of memory stream(mentioned in ITextSharp Out of memory exception merging multiple pdf) to merge pdf files:
var parentDirectory = Directory.GetParent(SelectedDocuments[0].FilePath);
var savePath = parentDirectory + "\\MergedDocument.pdf";
using (var fs = new FileStream(savePath, FileMode.Create))
{
using (var document = new Document())
{
using (var pdfCopy = new PdfCopy(document, fs))
{
document.Open();
for (var i = 0; i < SelectedDocuments.Count; i++)
{
using (var pdfReader = new PdfReader(SelectedDocuments[i].FilePath))
{
for (var page = 0; page < pdfReader.NumberOfPages;)
{
pdfCopy.AddPage(pdfCopy.GetImportedPage(pdfReader, ++page));
}
}
}
}
}
}
Anyname Donotcare
Updated on July 05, 2022Comments
-
Anyname Donotcare almost 2 years
How to merge multiple pdf files (generated on run time) through
ItextSharp
then printing them.I found the following link but that method requires the pdf names considering that the pdf files stored and this is not my case .
I have multiple reports i'll convert them to
pdf files
through this method :private void AddReportToResponse(LocalReport followsReport) { string mimeType; string encoding; string extension; string[] streams = new string[100]; Warning[] warnings = new Warning[100]; byte[] pdfStream = followsReport.Render("PDF", "", out mimeType, out encoding, out extension, out streams, out warnings); //Response.Clear(); //Response.ContentType = mimeType; //Response.AddHeader("content-disposition", "attachment; filename=Application." + extension); //Response.BinaryWrite(pdfStream); //Response.End(); }
Now i want to merge all those generated files (
Bytes
) in one pdf file to print it -
mkl about 11 yearsPlease refrain from using this kind of merge routine unless you have very specific requirements forcing you to do that. When you use
PdfWriter
to merge source PDFs, interactive features (forms and other annotations) are lost. Furthermore the resulting PDF internally contains an unnecessary wrapper around the page information which when iterated multiple times may cause PDF viewers to fail when trying to display the PDF. -
DSlagle about 11 yearsAs I said I had pulled this from older code that was in production but PDFs were generated from html built by a wysiwyg editor so we had no interactive features. Also our iterations were usually only around 10 at a time and we never had issues with the pdf not opening. I posted this as an example as we had it running in production and I know that it was working to merge pdfs with no reported issues.
-
mkl about 11 yearsI intended no offense; such merging solutions like yours based on
PdfWriter
indeed can be found more often than the ones using the better suited classes when googling around, and they do work after a fashion, so they are not completely wrong.Pdf*Copy*
based solutions, though, generally are easier to use (no need to adapt the target document page size and rotation again and again), more complete (concerning interactive features), and produce cleaner output (with respect to the internal PDF structure). -
mkl over 10 yearsIt really is interesting to see that someone downvoted this answer without leaving a comment explaining deficiencies... That been said, iText has developed meanwhile and much of the
PdfCopyFields
specific stuff has found its way intoPdfCopy
. -
mkl over 9 years@BonusKun It mergers really slow - if it is exceptionally slow, you might want to create a question in its own right providing sample documents to reproduce the problem... (I upvoted btw.) - thanx!
-
Lill Lansey almost 9 yearsIf you want to save the file to disk, after line "mergedPdf = ms.ToArray();", use: System.IO.File.WriteAllBytes(@"C:\MyFileName.pdf", mergedPdf);
-
10K35H 5H4KY4 almost 9 yearsWhat if you want as download stream? return File(ms, "application/pdf", "file.pdf") is not working for me
-
mkl almost 9 years@NearlyCrazy That essentially constututes a new question, How to return the contents of a MemoryStream as a download stream; please make that a question in its own right. I'm afraid I cannot help here as I'm not into .Net web service stuff.
-
Vikky over 8 yearsi am getting an error "An item with the same key has already been added." at below line " copy.AddPage(copy.GetImportedPage(reader, System.Threading.Interlocked.Increment(page)))". Can anyone help please]
-
mkl over 8 years@Vikky that line is not in the code above. Thus, your question is about a different situation than explicitly explained here. To get help, therefore, you should make that an actual so question and supply the required information.
-
Vikky over 8 yearsi converted this code to vb, i googled many times and not getting any ref. Please help if you can
-
mkl over 8 yearsYou use
System.Threading.Interlocked.Increment(page)
. This seems to indicate that you access and change thatpage
variable from multiple threads, and therefore the copying classes, too. The classes above are not inherently thread-safe. So you have to take special care. Thus, please describe your issue in detail in a question in its own right. -
Vikky over 8 yearshi @mkl, i have posted in detail here stackoverflow.com/questions/33043151/…
-
MikeTeeVee almost 8 yearsYou are missing
document.Close();
Without this, you may see errors when opening the merged pdf file, like "There was an error opening this document. The file is damaged and could not be repaired.". For my purposes, I needed to return a Stream object, so I declared this variable above the Usings:Stream stream = null;
then near the end of my Usings (immediately after callingdocument.Close();
), I added thisstream = new MemoryStream(ms.ToArray());
-
mkl almost 8 years@MikeTeeVee You are missing
document.Close()
- No,using (Document document = new Document())
implicitly closes the document. If you need to grab the memory stream contents before the closing bracket of theusing
, though, you indeed need to explicitly close. -
Dragonthoughts over 5 yearsIt would be more helpful if you could explain how this answers the question.
-
mkl over 5 yearsPlease refrain from using this kind of merge routine unless you have very specific requirements forcing you to do that. When you use
PdfWriter
to merge source PDFs, interactive features (forms and other annotations) are lost. Furthermore the resulting PDF internally contains an unnecessary wrapper around the page information which when iterated multiple times may cause PDF viewers to fail when trying to display the PDF.