What would be an ideal buffer size?

c++ c io
19,458

Solution 1

SOURCE:
How do you determine the ideal buffer size when using FileInputStream?

Optimum buffer size is related to a number of things: file system block size, CPU cache size and cache latency.

Most file systems are configured to use block sizes of 4096 or 8192. In theory, if you configure your buffer size so you are reading a few bytes more than the disk block, the operations with the file system can be extremely inefficient (i.e. if you configured your buffer to read 4100 bytes at a time, each read would require 2 block reads by the file system). If the blocks are already in cache, then you wind up paying the price of RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not in cache yet, the you pay the price of the disk->RAM latency as well.

This is why you see most buffers sized as a power of 2, and generally larger than (or equal to) the disk block size. This means that one of your stream reads could result in multiple disk block reads - but those reads will always use a full block - no wasted reads.

Ensuring this also typically results in other performance friendly parameters affecting both reading and subsequent processing: data bus width alignment, DMA alignment, memory cache line alignment, whole number of virtual memory pages.

Solution 2

  1. At least in my case, the assumption is that the underlying system is using a buffer whose size is a power of two, too, so it's best to try and match. I think nowadays buffers should be made a bit bigger than what "most" programmers tend to make them. I'd go with 32 KB rather than 4, for instance.
  2. It's very hard to know in advance, unfortunately. It depends on whether your application is I/O or CPU bound, for instance.

Solution 3

  1. I think that mostly it's just choosing a "round" number. If computers worked in decimal we'd probably choose 1000 or 10000 instead of 1024 or 8192. There is no very good reason.

One possible reason is that disk sectors are usually 512 bytes in size so reading a multiple of that is more efficient, assuming that all the hardware layers and caching cause the low level code to actually be able to use this fact efficiently. Which it probably can't unless you are writing a device driver or doing an unbuffered read.

Share:
19,458
Baruch
Author by

Baruch

Updated on June 16, 2022

Comments

  • Baruch
    Baruch almost 2 years

    Possible Duplicate:
    How do you determine the ideal buffer size when using FileInputStream?

    When reading raw data from a file (or any input stream) using either the C++'s istream family's read() or C's fread(), a buffer has to be supplied, and a number of how much data to read. Most programs I have seen seem to arbitrarily chose a power of 2 between 512 and 4096.

    1. Is there a reason it has to/should be a power of 2, or this just programer's natural inclination to powers of 2?
    2. What would be the "ideal" number? By "ideal" I mean that it would be the fastest. I assume it would have to be a multiple of the underlying device's buffer size? Or maybe of the underlying stream object's buffer? How would I determine what the size of those buffers is, anyway? And once I do, would using a multiple of it give any speed increase over just using the exact size?

    EDIT
    Most answers seem to be that it can't be determined at compile time. I am fine with finding it at runtime.

  • Baruch
    Baruch almost 12 years
    I don't need it in advance. I am fine with finding it at runtime