Memory-mapped files in Java

25,357

Solution 1

Did anyone actually check to see if ByteBuffers created by memory mapping support invoking .array() in the first place, regardless of readonly/readwrite?

From my poking around, as far as I can tell, the answer is NO. A ByteBuffer's ability to return a direct byte[] array via ByteBuffer.array() is goverened by the presence of ByteBuffer.hb (byte[]), which is always set to null when a MappedByteBuffer is created.

Which kinda sucks for me, because I was hoping to do something similar to what the question author wanted to do.

Solution 2

Its always good not to reinvent the wheels. Apache has provided a beautiful library for performing I/O operations. Take a look at http://commons.apache.org/io/description.html

Here's the scenario it serves. Suppose you have some data that you'd prefer to keep in memory, but you don't know ahead of time how much data there is going to be. If there's too much, you want to write it to disk instead of hogging memory, but you don't want to write to disk until you need to, because disk is slow and is a resource that needs tracking for cleanup.

So you create a temporary buffer and start writing to that. If / when you reach the threshold for what you want to keep in memory, you'll need to create a file, write out what's in the buffer to that file, and write all subsequent data to the file instead of the buffer.

That's what DeferredOutputStream does for you. It hides all the messing around at the point of switch-over. All you need to do is create the deferred stream in the first place, configure the threshold, and then just write away to your heart's content.

EDIT: I just did a small re-search using google and found this link: http://lists.apple.com/archives/java-dev/2004/Apr/msg00086.html (Lightning fast file read/write). Very impressive.

Solution 3

Wrapping byte[] won't slow things down...there won't be any huge array copies or other little performance evils. From the JavaDocs: java.nio.ByteBuffer .wrap()

Wraps a byte array into a buffer.

The new buffer will be backed by the the given byte array; that is, modifications to the buffer will cause the array to be modified and vice versa. The new buffer's capacity and limit will be array.length, its position will be zero, and its mark will be undefined. Its backing array will be the given array, and its array offset will be zero.

Solution 4

Using the ByteBuffer.wrap() functionality does not impose a high burden. It allocates a simple object and initializes a few integers. Writing your algorithm against ByteBuffer is thus your best bet if you need to work with read only files.

Share:
25,357
Ami
Author by

Ami

my about me is currently blank.

Updated on November 18, 2020

Comments

  • Ami
    Ami over 3 years

    I've been trying to write some very fast Java code that has to do a lot of I/O. I'm using a memory mapped file that returns a ByteBuffer:

    public static ByteBuffer byteBufferForFile(String fname){
        FileChannel vectorChannel;
        ByteBuffer vector;
        try {
            vectorChannel = new FileInputStream(fname).getChannel();
        } catch (FileNotFoundException e1) {
            e1.printStackTrace();
            return null;
        }
        try {
            vector = vectorChannel.map(MapMode.READ_ONLY,0,vectorChannel.size());
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
        return vector;
    }
    

    The problem that I'm having is that the ByteBuffer .array() method (which should return a byte[] array) doesn't work for read-only files. I want to write my code so that it will work with both memory buffers constructed in memory and buffers read from the disk. But I don't want to wrap all of my buffers a ByteBuffer.wrap() function because I'm worried that this will slow things down. So I've been writing two versions of everything, one that takes a byte[], the other that takes a ByteBuffer.

    Should I just wrap everything? Or should I double-write everything?

  • Ami
    Ami about 15 years
    Thanks. I'm just concerned about having to read every byte with .get(i) instead of [i], since .get(i) involves a method call whereas [i] is done in the bytecode.
  • Stu Thompson
    Stu Thompson about 15 years
    That seems like an awfully "fine grained" performance concern, and smells like premature optimization to me. The JVM is good about stuff like this. Benchmark it to prove it to yourself one way or the other.
  • BoraKurucu
    BoraKurucu about 15 years
    Correct me if I am wrong. You are looking for fast way of doing I/O operations. Correct??
  • Ami
    Ami about 15 years
    Actually, I'm just looking for fast ways of doing I, but I'm also looking for ways of processing the buffers with the minimum amount of buffer copies.
  • Stu Thompson
    Stu Thompson about 15 years
    Do you have benchmarks showing a performance penalty because of the method call above? I cannot imagine it being more than minor, and would think there would be other areas more ripe for tuning. Taking a scientific approach to these things trumps all. Hmmm...might investigate over the weekend just for fun! :)
  • Stu Thompson
    Stu Thompson about 15 years
    Also, I have to ask, what version of Java are you using? Each major release has seen considerable improvements in JVM performance. If your task is that performance sensitive, you should be on Java 6.
  • Ami
    Ami almost 15 years
    I agree. It sucks. I can't believe that ByteBuffer doesn't implement array(). On the other hand, we did some performance tests, and we found that it's sometimes faster to use .get() with a memory-mapped file than to use programmed-io, and it's sometimes faster to use programmed-io. It's very weird. But there is more variance on programmed-io than on memory-mapped files.
  • Vishy
    Vishy over 12 years
    A byte[] has to be on the heap. A memory mapped block of memory has to be outside the heap. It would be nice if the distinction were transparent, but I prefer to use the getLong/putLong method of a ByteBuffer anyway (these are much faster with using native ordering)
  • dma_k
    dma_k about 12 years
    @GauravSaini: Do you refer DeferredOutputStream from Apache commons-io? I can't find such class in Javadoc for v2.3 & v2.2.