VB.Net Merge multiple pdfs into one and export

25,270

Solution 1

I have a console that monitors individual folders in a designated folder then needs to merge all of the pdf's in that folder into a single pdf. I pass an array of file paths as strings and the output file i would like.

This is the function i use.

Public Shared Function MergePdfFiles(ByVal pdfFiles() As String, ByVal outputPath As String) As Boolean
    Dim result As Boolean = False
    Dim pdfCount As Integer = 0     'total input pdf file count
    Dim f As Integer = 0    'pointer to current input pdf file
    Dim fileName As String
    Dim reader As iTextSharp.text.pdf.PdfReader = Nothing
    Dim pageCount As Integer = 0
    Dim pdfDoc As iTextSharp.text.Document = Nothing    'the output pdf document
    Dim writer As PdfWriter = Nothing
    Dim cb As PdfContentByte = Nothing

    Dim page As PdfImportedPage = Nothing
    Dim rotation As Integer = 0

    Try
        pdfCount = pdfFiles.Length
        If pdfCount > 1 Then
            'Open the 1st item in the array PDFFiles
            fileName = pdfFiles(f)
            reader = New iTextSharp.text.pdf.PdfReader(fileName)
            'Get page count
            pageCount = reader.NumberOfPages

            pdfDoc = New iTextSharp.text.Document(reader.GetPageSizeWithRotation(1), 18, 18, 18, 18)

            writer = PdfWriter.GetInstance(pdfDoc, New FileStream(outputPath, FileMode.OpenOrCreate))


            With pdfDoc
                .Open()
            End With
            'Instantiate a PdfContentByte object
            cb = writer.DirectContent
            'Now loop thru the input pdfs
            While f < pdfCount
                'Declare a page counter variable
                Dim i As Integer = 0
                'Loop thru the current input pdf's pages starting at page 1
                While i < pageCount
                    i += 1
                    'Get the input page size
                    pdfDoc.SetPageSize(reader.GetPageSizeWithRotation(i))
                    'Create a new page on the output document
                    pdfDoc.NewPage()
                    'If it is the 1st page, we add bookmarks to the page
                    'Now we get the imported page
                    page = writer.GetImportedPage(reader, i)
                    'Read the imported page's rotation
                    rotation = reader.GetPageRotation(i)
                    'Then add the imported page to the PdfContentByte object as a template based on the page's rotation
                    If rotation = 90 Then
                        cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(i).Height)
                    ElseIf rotation = 270 Then
                        cb.AddTemplate(page, 0, 1.0F, -1.0F, 0, reader.GetPageSizeWithRotation(i).Width + 60, -30)
                    Else
                        cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0)
                    End If
                End While
                'Increment f and read the next input pdf file
                f += 1
                If f < pdfCount Then
                    fileName = pdfFiles(f)
                    reader = New iTextSharp.text.pdf.PdfReader(fileName)
                    pageCount = reader.NumberOfPages
                End If
            End While
            'When all done, we close the document so that the pdfwriter object can write it to the output file
            pdfDoc.Close()
            result = True
        End If
    Catch ex As Exception
        Return False
    End Try
    Return result
End Function

Solution 2

the code that was marked correct does not close all the file streams therefore the files stay open within the app and you wont be able to delete unused PDFs within your project

This is a better solution:

Public Sub MergePDFFiles(ByVal outPutPDF As String) 

    Dim StartPath As String = FileArray(0) ' this is a List Array declared Globally
    Dim document = New Document()
    Dim outFile = Path.Combine(outPutPDF)' The outPutPDF varable is passed from another sub this is the output path
    Dim writer = New PdfCopy(document, New FileStream(outFile, FileMode.Create))

    Try

        document.Open()
        For Each fileName As String In FileArray

            Dim reader = New PdfReader(Path.Combine(StartPath, fileName))

            For i As Integer = 1 To reader.NumberOfPages

                Dim page = writer.GetImportedPage(reader, i)
                writer.AddPage(page)

            Next i

            reader.Close()

        Next

        writer.Close()
        document.Close()

    Catch ex As Exception
        'catch a Exception if needed

    Finally

        writer.Close()
        document.Close()

    End Try


End Sub

25,270

Vikky

Updated on April 24, 2020

Comments

Vikky about 4 years

I have to merge multiple PDFs into a single PDF.

I am using the iText.sharp library, and collect converted the code and tried to use it (from here) The actual code is in C# and I converted that to VB.NET.

 Private Function MergeFiles(ByVal sourceFiles As List(Of Byte())) As Byte()
    Dim mergedPdf As Byte() = Nothing
    Using ms As New MemoryStream()
        Using document As New Document()
            Using copy As New PdfCopy(document, ms)
                document.Open()
                For i As Integer = 0 To sourceFiles.Count - 1
                    Dim reader As New PdfReader(sourceFiles(i))
                    ' loop over the pages in that document
                    Dim n As Integer = reader.NumberOfPages
                    Dim page As Integer = 0
                    While page < n
                        page = page + 1
                        copy.AddPage(copy.GetImportedPage(reader, page))
                    End While
                Next
            End Using
        End Using
        mergedPdf = ms.ToArray()
    End Using
End Function

I am now getting the following error:

An item with the same key has already been added.

I did some debugging and have tracked the problem down to the following lines:

copy.AddPage(copy.GetImportedPage(reader,
copy.AddPage(copy.GetImportedPage(reader, page)))

Why is this error happening?

AStopher over 8 years

FYI: Possible duplicate of An item with the same key has already been added to dictionary.
Vikky over 8 years

it works like a charm @Sean Wessell. Thanks for such a great help
Vikky over 8 years

and again here i am not able to understand the co-relation for duplicacy
Bruno Lowagie over 6 years

Down-voting because you are misleading people into thinking this is the way to merge documents (see this question. As explained in chapter 6 of my book, you are throwing away all interactivity. If the original files contain links, annotations,... they will all be gone after merging. Because of answers like yours, many developers do the wrong thing (and it's so tiring for us having to explain over and over again what they are doing wrong).
G_Hosa_Phat about 6 years

@BrunoLowagie - It depends on the requirements of the merge system. If the PDF files do not contain any interactive content, links, annotations, etc., or those elements are unimportant within the context of the merged document, then, if this code succeeds in merging multiple PDF files, it's an acceptable answer. The only thing I would suggest for improving the answer would be to mention the possibility/likelihood of functionality loss using this method.
Bruno Lowagie about 6 years

@G_Hosa_Phat If you add the likelihood of functionality loss, also mention that each page is added to the new document as a Form XObject. When this operation is repeated many times (which is the case in some projects I helped debug), you end up with XObjects referring to XObjects referring to XObjects. Too many nested XObjects can cause performance problems and even hit implementation limits of the viewer that make the viewer fail to render the document or even crash.
G_Hosa_Phat about 6 years

@BrunoLowagie - Absolutely. The answer should include any possible negative side-effects that might result, although we all know that we don't always get (or know of) such problems listed. I've read a part of the sample chapter you've linked, and, as I'm interested in the subject of how best to merge PDF files, I'll probably be looking more closely into that. However, if the code above works for a "proof of concept" solution, I'll probably start there.
mkl about 6 years

a Closing the Document implicitly closes the PdfWriter, it does not make sense to close the writer explicitly. b If you close the Document in the Finally block anyways, it does not make sense also closing it in the Try block. c If you want to make sure to close everything, why don't you explicitly close the file stream? One can after all disable the implicit closing of the file stream in the writer...