How to Merge two memory streams containing PDF file's data into one

14,478

Solution 1

It appears in case of PDF files, the merging of memorystreams is not the same as with .txt files. For PDF, you need to use some .dll as I used iTextSharp.dll (available under the AGPL license) and then combine them using this library's functions as follows:

MemoryStream finalStream = new MemoryStream();
PdfCopyFields copy = new PdfCopyFields(finalStream);
string file1Path = "Sample1.pdf";
string file2Path = "Sample2.pdf";

var ms1 = new MemoryStream(File.ReadAllBytes(file1Path));
ms1.Position = 0;
copy.AddDocument(new PdfReader(ms1));
ms1.Dispose();

var ms2 = new MemoryStream(File.ReadAllBytes(file2Path));
ms2.Position = 0;
copy.AddDocument(new PdfReader(ms2));
ms2.Dispose();
copy.Close();

finalStream contains the merged pdf of both ms1 and ms2.

Solution 2

NOTE:

The whole question is based on a false premise, that you can produce a combined PDF file by merging the binaries of two PDF files. This works for plain text files for example (to an extent), but definitely doesn't work for PDFs. The answer only addresses how to merge two binary data streams, not how to merge two PDF files in particular. It answers the OP's question as asked, but doesn't actually solve his problem.

When you use the byte[] constructor for MemoryStream, the memory stream will not expand as you add more data. So it will not be big enough for both stream1 and stream2. Also, the position will start at zero, so you're overwriting stream2 with the data in stream1.

The fix is rather simple:

var result = new MemoryStream();
using (var file1 = File.OpenRead(file1Path)) file1.CopyTo(result);
using (var file2 = File.OpenRead(file2Path)) file2.CopyTo(result);

Another option would be to create your own stream class that would be a combination of two separate streams - interesting if you're interested in composability, but probably an overkill for something as simple as this :)

Just for fun, it could look something like this:

public class DualStream : Stream
{
    private readonly Stream _first;
    private readonly Stream _second;

    public DualStream(Stream first, Stream second)
    {
        _first = first;
        _second = second;
    }

    public override bool CanRead => true;
    public override bool CanSeek => true;
    public override bool CanWrite => false;
    public override long Length => _first.Length + _second.Length;

    public override long Position
    {
        get { return _first.Position + _second.Position; }
        set { Seek(value, SeekOrigin.Begin); }
    }

    public override void Flush() { throw new NotImplementedException(); }

    public override int Read(byte[] buffer, int offset, int count)
    {
        var bytesRead = _first.Read(buffer, offset, count);

        if (bytesRead == count) return bytesRead;

        return bytesRead + _second.Read(buffer, offset + bytesRead, count - bytesRead);
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        // To simplify, let's assume seek always works as if over one big MemoryStream
        long targetPosition;

        switch (origin)
        {
            case SeekOrigin.Begin: targetPosition = offset; break;
            case SeekOrigin.Current: targetPosition = Position + offset; break;
            case SeekOrigin.End: targetPosition = Length - offset; break;
            default: throw new NotSupportedException();
        }

        targetPosition = Math.Max(0, Math.Min(Length, targetPosition));

        var firstPosition = Math.Min(_first.Length, targetPosition);
        _first.Position = firstPosition;
        _second.Position = Math.Max(0, targetPosition - firstPosition);

        return Position;
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            _first.Dispose();
            _second.Dispose();
        }

        base.Dispose(disposing);
    }

    public override void SetLength(long value) 
      { throw new NotImplementedException(); }
    public override void Write(byte[] buffer, int offset, int count) 
      { throw new NotImplementedException(); }
}

The main benefit is that it means you don't have to allocate unnecessary in-memory buffers just to have a combined stream - it can even be used with the file streams directly, if you dare :D And it's easily composable - you can make dual streams of other dual streams, allowing you to chain as many streams as you want together - pretty much the same as IEnumerable.Concat.

Share:
14,478
ArslanIqbal
Author by

ArslanIqbal

Full Stack | Web Developer | Front-end | Server Side | DB Management Experienced in software development with exposure to different kind of projects and products. Team player, strong interpersonal skills, proactive attitude and aptitude to solve complex problems. Tools & Tech Asp.Net - C# - Azure Functions JavaScript - jQuery - AngularJS - Kendo UI - Bootstrap - HTML/CSS SQL - Oracle Business/Functional Domain Hospital Management Sales and Distribution Contract Management System

Updated on June 05, 2022

Comments

  • ArslanIqbal
    ArslanIqbal almost 2 years

    I am trying to read two PDF files into two memory streams and then return a stream that will have both stream's data. But I don't seem to understand what's wrong with my code.

    Sample Code:

    string file1Path = "Sampl1.pdf";
    string file2Path = "Sample2.pdf";
    MemoryStream stream1 = new MemoryStream(File.ReadAllBytes(file1Path));
    MemoryStream stream2 = new MemoryStream(File.ReadAllBytes(file2Path));
    stream1.Position = 0;
    stream1.Copyto(stream2);
    return stream2;   /*supposed to be containing data of both stream1 and stream2 but contains data of stream1 only*/