What is the best buffer size when using BinaryReader to read big files (>1 GB)?

10,888

There is no best or worst buffer size, but you have to look at the some aspects.

As you are using C#, so you run on Windows, Windows uses NTFS and its page size is 4 MB, so it is advisable to use multiples of 4096. So your buffer size is 16*1024 = 4*4096, and it is a good choice, but to say if it is better or worse than 16*4096 we cannot say.

Everything depends on the situation and the requirements for program. Remember here you cannot choose the best option, but only some better. I recommend to use 4096, but also you could use your own 4*4096 or even 16*4096, but remember, that this buffer will be allocated on the heap, so its allocation takes some time, so you don't want to allocate a big buffer, for example 128*4096.

Share:
10,888
Amir Pournasserian
Author by

Amir Pournasserian

Founder of uBeac, a Toronto-based startup delivering IoT-powered Smart solutions. uBeac brings cloud, AI, and IoT technologies to the factory floor enabling real-time data-driven decision making within and across factories.

Updated on July 19, 2022

Comments

  • Amir Pournasserian
    Amir Pournasserian almost 2 years

    I'm reading binary files and here is a sample:

    public static byte[] ReadFully(Stream input)
    {
        byte[] buffer = new byte[16*1024];
        int read;
        while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
        {
            ......
        }
    
    }
    

    Obviously the buffer size (16*1024) has a great role in performance. I've read that it depends on the I/O technology (SATA, SSD, SCSI, etc.) and also the fragment size of the partition which file exists on it (we can define during the formatting the partition).

    But here is the question: Is there any formula or best practice to define the buffer size? Right now, I'm defining based on trial-and-error.

    Edit: I've tested the application on my server with different buffer sizes, and I get the best performance with 4095*256*16 (16 MB)!!! 4096 is 4 seconds slower.

    Here are some older posts which are very helpful but I can't still get the reason:

  • Alexei Levenkov
    Alexei Levenkov over 10 years
    +1. Going above 80K will force buffer to go on LOH and it brings its own issues (primary for 32bit processes)... 4-64K is likely the range to stick too for most cases.
  • juFo
    juFo over 6 years
    @Alexei, see release notes about Runtime – GC Performance Improvements: blogs.msdn.microsoft.com/dotnet/2017/10/17/…