Java ByteBuffer performance issue

18,573

Solution 1

I believe you are just doing micro-optimization, which might just not matter (www.codinghorror.com).

Below is a version with a larger buffer and redundant seek / setPosition calls removed.

  • When I enable "native byte ordering" (which is actually unsafe if the machine uses a different 'endian' convention):
mmap: 1.358
bytebuffer: 0.922
regular i/o: 1.387
  • When I comment out the order statement and use the default big-endian ordering:
mmap: 1.336
bytebuffer: 1.62
regular i/o: 1.467
  • Your original code:
mmap: 3.262
bytebuffer: 106.676
regular i/o: 90.903

Here's the code:

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.MappedByteBuffer;

class Testbb2 {
    /** Buffer a whole lot of long values at the same time. */
    static final int BUFFSIZE = 0x800 * 8; // 8192
    static final int DATASIZE = 0x8000 * BUFFSIZE;

    static public long byteArrayToLong(byte [] in, int offset) {
        return ((((((((long)(in[offset + 0] & 0xff) << 8) | (long)(in[offset + 1] & 0xff)) << 8 | (long)(in[offset + 2] & 0xff)) << 8 | (long)(in[offset + 3] & 0xff)) << 8 | (long)(in[offset + 4] & 0xff)) << 8 | (long)(in[offset + 5] & 0xff)) << 8 | (long)(in[offset + 6] & 0xff)) << 8 | (long)(in[offset + 7] & 0xff);
    }

    public static void main(String [] args) throws IOException {
        long start;
        RandomAccessFile fileHandle;
        FileChannel fileChannel;

        // Sanity check - this way the convert-to-long loops don't need extra bookkeeping like BUFFSIZE / 8.
        if ((DATASIZE % BUFFSIZE) > 0 || (DATASIZE % 8) > 0) {
            throw new IllegalStateException("DATASIZE should be a multiple of 8 and BUFFSIZE!");
        }

        int pos;
        int nDone;

        // create file
        File testFile = new File("file.dat");
        fileHandle = new RandomAccessFile("file.dat", "rw");

        if (testFile.exists() && testFile.length() >= DATASIZE) {
            System.out.println("File exists");
        } else {
            testFile.delete();
            System.out.println("Preparing file");
            byte [] buffer = new byte[BUFFSIZE];
            pos = 0;
            nDone = 0;
            while (pos < DATASIZE) {
                fileHandle.write(buffer);
                pos += buffer.length;
            }

            System.out.println("File prepared");
        } 
        fileChannel = fileHandle.getChannel();

        // mmap()
        MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, DATASIZE);
        byte [] buffer1 = new byte[BUFFSIZE];
        mbb.position(0);
        start = System.currentTimeMillis();
        pos = 0;
        while (pos < DATASIZE) {
            mbb.get(buffer1, 0, BUFFSIZE);
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = byteArrayToLong(buffer1, i);
            }
            pos += BUFFSIZE;
        }
        System.out.println("mmap: " + (System.currentTimeMillis() - start) / 1000.0);

        // bytebuffer
        ByteBuffer buffer2 = ByteBuffer.allocateDirect(BUFFSIZE);
//        buffer2.order(ByteOrder.nativeOrder());
        buffer2.order();
        fileChannel.position(0);
        start = System.currentTimeMillis();
        pos = 0;
        nDone = 0;
        while (pos < DATASIZE) {
            buffer2.rewind();
            fileChannel.read(buffer2);
            buffer2.rewind();   // need to rewind it to be able to use it
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = buffer2.getLong();
            }
            pos += BUFFSIZE;
        }
        System.out.println("bytebuffer: " + (System.currentTimeMillis() - start) / 1000.0);

        // regular i/o
        fileHandle.seek(0);
        byte [] buffer3 = new byte[BUFFSIZE];
        start = System.currentTimeMillis();
        pos = 0;
        while (pos < DATASIZE && nDone != -1) {
            nDone = 0;
            while (nDone != -1  && nDone < BUFFSIZE) {
                nDone = fileHandle.read(buffer3, nDone, BUFFSIZE - nDone);
            }
            // This assumes BUFFSIZE is a multiple of 8.
            for (int i = 0; i < BUFFSIZE; i += 8) {
                long dummy = byteArrayToLong(buffer3, i);
            }
            pos += nDone;
        }
        System.out.println("regular i/o: " + (System.currentTimeMillis() - start) / 1000.0);
    }
}

Solution 2

Reading into the direct byte buffer is faster, but getting the data out of it into the JVM is slower. Direct byte buffer is intended for cases where you're just copying the data without actually looking at it in the Java code. Then it doesn't have to cross the native->JVM boundary at all, so it's quicker than using e.g. a byte[] array or a normal ByteBuffer, where the data would have to cross that boundary twice in the copy process.

Solution 3

When you have a loop which iterates more than 10,000 times it can trigger the whole method to be compiled to native code. However, your later loops have not been run and cannot be optimised to the same degree. To avoid this issue, place each loop in a different method and run again.

Additionally, you may want to set the Order for the ByteBuffer to be order(ByteOrder.nativeOrder()) to avoid all the bytes swapping around when you do a getLong and read more than 24 bytes at a time. (As reading very small portions generates much more system calls) Try reading 32*1024 bytes at a time.

I wound also try getLong on the MappedByteBuffer with native byte order. This is likely to be the fastest.

Share:
18,573

Related videos on Youtube

Folkert van Heusden
Author by

Folkert van Heusden

www.vanheusden.com

Updated on June 04, 2022

Comments

  • Folkert van Heusden
    Folkert van Heusden over 1 year

    While processing multiple gigabyte files I noticed something odd: it seems that reading from a file using a filechannel into a re-used ByteBuffer object allocated with allocateDirect is much slower than reading from a MappedByteBuffer, in fact it is even slower than reading into byte-arrays using regular read calls!

    I was expecting it to be (almost) as fast as reading from mappedbytebuffers as my ByteBuffer is allocated with allocateDirect, hence the read should end-up directly in my bytebuffer without any intermediate copies.

    My question now is: what is it that I'm doing wrong? Or is bytebuffer+filechannel really slowe r than regular io/mmap?

    I the example code below I also added some code that converts what is read into long values, as that is what my real code constantly does. I would expect that the ByteBuffer getLong() method is much faster than my own byte shuffeler.

    Test-results: mmap: 3.828 bytebuffer: 55.097 regular i/o: 38.175

    import java.io.File;
    import java.io.IOException;
    import java.io.RandomAccessFile;
    import java.nio.ByteBuffer;
    import java.nio.channels.FileChannel;
    import java.nio.channels.FileChannel.MapMode;
    import java.nio.MappedByteBuffer;
    
    class testbb {
        static final int size = 536870904, n = size / 24;
    
        static public long byteArrayToLong(byte [] in, int offset) {
            return ((((((((long)(in[offset + 0] & 0xff) << 8) | (long)(in[offset + 1] & 0xff)) << 8 | (long)(in[offset + 2] & 0xff)) << 8 | (long)(in[offset + 3] & 0xff)) << 8 | (long)(in[offset + 4] & 0xff)) << 8 | (long)(in[offset + 5] & 0xff)) << 8 | (long)(in[offset + 6] & 0xff)) << 8 | (long)(in[offset + 7] & 0xff);
        }
    
        public static void main(String [] args) throws IOException {
            long start;
            RandomAccessFile fileHandle;
            FileChannel fileChannel;
    
            // create file
            fileHandle = new RandomAccessFile("file.dat", "rw");
            byte [] buffer = new byte[24];
            for(int index=0; index<n; index++)
                fileHandle.write(buffer);
            fileChannel = fileHandle.getChannel();
    
            // mmap()
            MappedByteBuffer mbb = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, size);
            byte [] buffer1 = new byte[24];
            start = System.currentTimeMillis();
            for(int index=0; index<n; index++) {
                    mbb.position(index * 24);
                    mbb.get(buffer1, 0, 24);
                    long dummy1 = byteArrayToLong(buffer1, 0);
                    long dummy2 = byteArrayToLong(buffer1, 8);
                    long dummy3 = byteArrayToLong(buffer1, 16);
            }
            System.out.println("mmap: " + (System.currentTimeMillis() - start) / 1000.0);
    
            // bytebuffer
            ByteBuffer buffer2 = ByteBuffer.allocateDirect(24);
            start = System.currentTimeMillis();
            for(int index=0; index<n; index++) {
                buffer2.rewind();
                fileChannel.read(buffer2, index * 24);
                buffer2.rewind();   // need to rewind it to be able to use it
                long dummy1 = buffer2.getLong();
                long dummy2 = buffer2.getLong();
                long dummy3 = buffer2.getLong();
            }
            System.out.println("bytebuffer: " + (System.currentTimeMillis() - start) / 1000.0);
    
            // regular i/o
            byte [] buffer3 = new byte[24];
            start = System.currentTimeMillis();
            for(int index=0; index<n; index++) {
                    fileHandle.seek(index * 24);
                    fileHandle.read(buffer3);
                    long dummy1 = byteArrayToLong(buffer1, 0);
                    long dummy2 = byteArrayToLong(buffer1, 8);
                    long dummy3 = byteArrayToLong(buffer1, 16);
            }
            System.out.println("regular i/o: " + (System.currentTimeMillis() - start) / 1000.0);
        }
    }
    

    As loading large sections and then processing is them is not an option (I'll be reading data all over the place) I think I should stick to a MappedByteBuffer. Thank you all for your suggestions.

  • Folkert van Heusden
    Folkert van Heusden about 12 years
    Moving the code into seperate methods did not make any difference. Using also getLong in the mappedbytebuffer indeed made it even faster. But still I wonder why the second test ("read a bytebuffer from a filechannel") is so slow.\
  • Folkert van Heusden
    Folkert van Heusden about 12 years
    From what I read (in a book about NIO from o'reilly), a read to a properly allocated bytebuffer should also be direct without any copies. Unfortunately mapping the input-file to memory won't work in the real app as this can be terabytes in size. The numbers were at the bottom of my mail: mmap: 3.828 seconds bytebuffer: 55.097 seconds regular i/o: 38.175 seconds.
  • kdgregory
    kdgregory about 12 years
    @Folkert - either the author of that book was wrong, or you are misinterpreting what he/she said. Disk controllers deal with large block sizes, and the OS needs a place to buffer that data and carve out the piece that you need.
  • kdgregory
    kdgregory about 12 years
    But the real problem is that each of your reads -- in either NIO or IO -- is a separate system call, while the mapped file is a direct memory access (with a possible page fault). If your real application has a large proportion of localized reads, you will probably benefit from a buffer cache (which can be memory-mapped or on-heap). If you're jumping all over a terabyte-scale file, then the disk IO will become the limiting factor and even memory-mapping won't help.
  • Vishy
    Vishy about 12 years
    You are performing one system call for every 24 bytes. In the first example, you are performing only one or two system calls total.
  • Folkert van Heusden
    Folkert van Heusden about 12 years
    That would indeed by faster. Did not expect it to be that much faster so thanks!
  • WoooHaaaa
    WoooHaaaa almost 10 years
    So direct buffer and mapped memory does the same thing(avoid memory copy), beside direct buffer may cause a lot of system call ? right ?
  • kdgregory
    kdgregory almost 10 years
    @MrROY - I can't understand what you're asking. But no, a direct ByteBuffer shouldn't cause a lot of system calls. RandomAccessFile might.
  • Jay Askren
    Jay Askren about 9 years
    @kdgregory I don't think MappedByteBuffer is necessarily fastest. Using a RandomAccess file or regular ByteBuffer can be faster. See this blog for an example : mechanical-sympathy.blogspot.com/2011/12/…
  • Randall Whitman
    Randall Whitman almost 9 years
    If I'm not mistaken, the regular i/o section intends to use buffer3 in both loops, rather than reading longs out of the unchanging buffer1.