How do you decide what byte[] size to use for InputStream.read()?

14,884

Solution 1

Most people use powers of 2 for the size. If the buffer is at least 512 bytes, it doesn't make much difference ( < 20% )

For network the optimal size can be 2 KB to 8 KB (The underlying packet size is typically up to ~1.5 KB) For disk access, the fastest size can be 8K to 64 KB. If you use 8K or 16K you won't have a problem.

Note for network downloads, you are likely to find you usually don't use the whole buffer. Wasting a few KB doesn't matter much for 99% of use cases.

Solution 2

In that situation, I always use a reasonable power of 2, somewhere in the range of 2K to 16K. In general, different InputStreams will have different optimal values, but there is no easy way to determine the value.

In order to determine the optimal value, you'd need to understand more about the exact type of InputStream you are dealing with, as well as things like the specifications of the hardware that are servicing the InputStream.

Worrying about this is probably a case of premature optimization.

Solution 3

It mostly depends on how much memory you have and how much data you expect to read. You don't want to block too often, so consider BenCole's answer; on the other hand, you don't want to process a small chunk of data if your processing is slower than the actual reading.

I personally try to use a library and offload the task of choosing a buffer size to library authors. After that, I promise myself never read the library code, because it makes me mad.

Share:
14,884
cottonBallPaws
Author by

cottonBallPaws

Updated on June 13, 2022

Comments

  • cottonBallPaws
    cottonBallPaws almost 2 years

    When reading from InputStreams, how do you decide what size to use for the byte[]?

    int nRead;
    byte[] data = new byte[16384]; // <-- this number is the one I'm wondering about
    
    while ((nRead = is.read(data, 0, data.length)) != -1) {
      ...do something..
    }
    

    When do you use a small one vs a large one? What are the differences? Does the number want to be in increments of 1024? Does it make a difference if it is an InputStream from the network vs the disk?

    Thanks much, I can't seem to find a clear answer elsewhere.

  • j__m
    j__m over 4 years
    The only sane use of the available() method is to determine whether the call might block. You should only care whether it's zero or nonzero. You should also code with the understanding that some implementations may return zero every single time, in which case your code needs to notice this and then disregard it going forward. available() is not guaranteed to return the total size of the data, the amount that will be filled by the next read(), or really anything in particular.
  • j__m
    j__m over 4 years
    Current javadoc link does not agree with the quote given in the answer: docs.oracle.com/javase/7/docs/api/java/io/…
  • Matthieu
    Matthieu almost 4 years
    Great info about the difference between network/disk! I'd guess the protocol used can make a lot of difference (CIFS, NFS, ...). Is it something you noticed (e.g. with Java Chonicles ;)) wrt network? I was about to ask such a question...
  • Vishy
    Vishy almost 4 years
    @Matthieu networks are usually configured to have an MTU of 1536 byte so if you read the stream fast enough you will rarely see a 2K buffer fill. Disk subsystems however tend to have native block sizes around 64 K so larger blocks are consistently filled for files much larger than this.