What is the fastest way to read a large number of small files into memory?

10,590

Solution 1

A memory mapped file will be fastest... something like this:

    final File             file;
    final FileChannel      channel;
    final MappedByteBuffer buffer;

    file    = new File(fileName);
    fin     = new FileInputStream(file);
    channel = fin.getChannel();
    buffer  = channel.map(MapMode.READ_ONLY, 0, file.length());

and then proceed to read from the byte buffer.

This will be significantly faster than FileInputStream or FileReader.

EDIT:

After a bit of investigation with this it turns out that, depending on your OS, you might be better off using a new BufferedInputStream(new FileInputStream(file)) instead. However reading the whole thing all at once into a char[] the size of the file sounds like the worst way.

So BufferedInputStream should give roughly consistent performance on all platforms, while the memory mapped file may be slow or fast depending on the underlying OS. As with everything that is performance critical you should test your code and see what works best.

EDIT:

Ok here are some tests (the first one is done twice to get the files into the disk cache).

I ran it on the rt.jar class files, extracted to the hard drive, this is under Windows 7 beta x64. That is 16784 files with a total of 94,706,637 bytes.

First the results...

(remember the first is repeated to get the disk cache setup)

  • ArrayTest

    • time = 83016
    • bytes = 118641472
  • ArrayTest

    • time = 46570
    • bytes = 118641472
  • DataInputByteAtATime

    • time = 74735
    • bytes = 118641472
  • DataInputReadFully

    • time = 8953
    • bytes = 118641472
  • MemoryMapped

    • time = 2320
    • bytes = 118641472

Here is the code...

import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.HashSet;
import java.util.Set;

public class Main
{
    public static void main(final String[] argv)
    {
        ArrayTest.main(argv);
        ArrayTest.main(argv);
        DataInputByteAtATime.main(argv);
        DataInputReadFully.main(argv);
        MemoryMapped.main(argv);
    }
}

abstract class Test
{
    public final void run(final File root)
    {
        final Set<File> files;
        final long      size;
        final long      start;
        final long      end;
        final long      total;

        files = new HashSet<File>();
        getFiles(root, files);

        start = System.currentTimeMillis();

        size = readFiles(files);

        end = System.currentTimeMillis();
        total = end - start;

        System.out.println(getClass().getName());
        System.out.println("time  = " + total);
        System.out.println("bytes = " + size);
    }

    private void getFiles(final File      dir,
                          final Set<File> files)
    {
        final File[] childeren;

        childeren = dir.listFiles();

        for(final File child : childeren)
        {
            if(child.isFile())
            {
                files.add(child);
            }
            else
            {
                getFiles(child, files);
            }
        }
    }

    private long readFiles(final Set<File> files)
    {
        long size;

        size = 0;

        for(final File file : files)
        {
            size += readFile(file);
        }

        return (size);
    }

    protected abstract long readFile(File file);
}

class ArrayTest
    extends Test
{
    public static void main(final String[] argv)
    {
        final Test test;

        test = new ArrayTest();
        test.run(new File(argv[0]));
    }

    protected long readFile(final File file)
    {
        InputStream stream;

        stream = null;

        try
        {
            final byte[] data;
            int          soFar;
            int          sum;

            stream = new BufferedInputStream(new FileInputStream(file));
            data   = new byte[(int)file.length()];
            soFar  = 0;

            do
            {
                soFar += stream.read(data, soFar, data.length - soFar);
            }
            while(soFar != data.length);

            sum = 0;

            for(final byte b : data)
            {
                sum += b;
            }

            return (sum);
        }
        catch(final IOException ex)
        {
            ex.printStackTrace();
        }
        finally
        {
            if(stream != null)
            {
                try
                {
                    stream.close();
                }
                catch(final IOException ex)
                {
                    ex.printStackTrace();
                }
            }
        }

        return (0);
    }
}

class DataInputByteAtATime
    extends Test
{
    public static void main(final String[] argv)
    {
        final Test test;

        test = new DataInputByteAtATime();
        test.run(new File(argv[0]));
    }

    protected long readFile(final File file)
    {
        DataInputStream stream;

        stream = null;

        try
        {
            final int fileSize;
            int       sum;

            stream   = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
            fileSize = (int)file.length();
            sum      = 0;

            for(int i = 0; i < fileSize; i++)
            {
                sum += stream.readByte();
            }

            return (sum);
        }
        catch(final IOException ex)
        {
            ex.printStackTrace();
        }
        finally
        {
            if(stream != null)
            {
                try
                {
                    stream.close();
                }
                catch(final IOException ex)
                {
                    ex.printStackTrace();
                }
            }
        }

        return (0);
    }
}

class DataInputReadFully
    extends Test
{
    public static void main(final String[] argv)
    {
        final Test test;

        test = new DataInputReadFully();
        test.run(new File(argv[0]));
    }

    protected long readFile(final File file)
    {
        DataInputStream stream;

        stream = null;

        try
        {
            final byte[] data;
            int          sum;

            stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
            data   = new byte[(int)file.length()];
            stream.readFully(data);

            sum = 0;

            for(final byte b : data)
            {
                sum += b;
            }

            return (sum);
        }
        catch(final IOException ex)
        {
            ex.printStackTrace();
        }
        finally
        {
            if(stream != null)
            {
                try
                {
                    stream.close();
                }
                catch(final IOException ex)
                {
                    ex.printStackTrace();
                }
            }
        }

        return (0);
    }
}

class DataInputReadInChunks
    extends Test
{
    public static void main(final String[] argv)
    {
        final Test test;

        test = new DataInputReadInChunks();
        test.run(new File(argv[0]));
    }

    protected long readFile(final File file)
    {
        DataInputStream stream;

        stream = null;

        try
        {
            final byte[] data;
            int          size;
            final int    fileSize;
            int          sum;

            stream   = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
            fileSize = (int)file.length();
            data     = new byte[512];
            size     = 0;
            sum      = 0;

            do
            {
                size += stream.read(data);

                sum = 0;

                for(int i = 0; i < size; i++)
                {
                    sum += data[i];
                }
            }
            while(size != fileSize);

            return (sum);
        }
        catch(final IOException ex)
        {
            ex.printStackTrace();
        }
        finally
        {
            if(stream != null)
            {
                try
                {
                    stream.close();
                }
                catch(final IOException ex)
                {
                    ex.printStackTrace();
                }
            }
        }

        return (0);
    }
}
class MemoryMapped
    extends Test
{
    public static void main(final String[] argv)
    {
        final Test test;

        test = new MemoryMapped();
        test.run(new File(argv[0]));
    }

    protected long readFile(final File file)
    {
        FileInputStream stream;

        stream = null;

        try
        {
            final FileChannel      channel;
            final MappedByteBuffer buffer;
            final int              fileSize;
            int                    sum;

            stream   = new FileInputStream(file);
            channel  = stream.getChannel();
            buffer   = channel.map(MapMode.READ_ONLY, 0, file.length());
            fileSize = (int)file.length();
            sum      = 0;

            for(int i = 0; i < fileSize; i++)
            {
                sum += buffer.get();
            }

            return (sum);
        }
        catch(final IOException ex)
        {
            ex.printStackTrace();
        }
        finally
        {
            if(stream != null)
            {
                try
                {
                    stream.close();
                }
                catch(final IOException ex)
                {
                    ex.printStackTrace();
                }
            }
        }

        return (0);
    }
}

Solution 2

The most efficient way is:

  • Determine the length of the file (File.length())
  • Create a char buffer with the same size (or slightly larger)
  • Determine the encoding of the file
  • Use new InputStreamReader (new FileInputStream(file), encoding) to read
  • Read the while file into the buffer with a single call to read(). Note that read() might return early (not having read the whole file). In that case, call it again with an offset to read the next batch.
  • Create the string: new String(buffer)

If you need to search&replace once at startup, use String.replaceAll().

If you need to do it repeatedly, you may consider using StringBuilder. It has no replaceAll() but you can use it to manipulate the character array in place (-> no allocation of memory).

That said:

  1. Make your code as short and simple as possible.
  2. Measure the performance
  3. It it's too slow, fix it.

There is no reason to waste a lot of time into making this code run fast if it takes just 0.1s to execute.

If you still have a performance problem, consider to put all the text files into a JAR, add it into the classpath and use Class.getResourceAsStream() to read the files. Loading things from the Java classpath is highly optimized.

Solution 3

It depends a lot on the internal structure of your text files and what you intend to do with them.

Are the files key-value dictionaries (i.e. "properties" files)? XML? JSON? You have standard structures for those.

If they have a formal structure you may also use JavaCC to build an object representation of the files.

Otherwise, if they are just blobs of data, well, read the files and put them in a String.

Edit: about search&replace- juste use String's replaceAll function.

Solution 4

After searching across google for for existing tests on IO speed in Java, I must say TofuBear's test case completely opened my eyes. You have to run his test on your own platform to see what is fastest for you.

After running his test, and adding a few of my own (Credit to TofuBear for posting his original code), it appears you may get even more speed by using your own custom buffer vs. using the BufferedInputStream.

To my dismay the NIO ByteBuffer did not perform well.

NOTE: The static byte[] buffer shaved off a few ms, but the static ByteBuffers actualy increased the time to process! Is there anything wrong with the code??

I added a few tests:

  1. ArrayTest_CustomBuffering (Read data directly into my own buffer)

  2. ArrayTest_CustomBuffering_StaticBuffer (Read Data into a static buffer that is created only once in the beginning)

  3. FileChannelArrayByteBuffer (use NIO ByteBuffer and wrapping your own byte[] array)

  4. FileChannelAllocateByteBuffer (use NIO ByteBuffer with .allocate)

  5. FileChannelAllocateByteBuffer_StaticBuffer (same as 4 but with a static buffer)

  6. FileChannelAllocateDirectByteBuffer (use NIO ByteBuffer with .allocateDirect)

  7. FileChannelAllocateDirectByteBuffer_StaticBuffer (same as 6 but with a static buffer)

Here are my results:, using Windows Vista and jdk1.6.0_13 on the extracted rt.jar: ArrayTest
time = 2075
bytes = 2120336424
ArrayTest
time = 2044
bytes = 2120336424
ArrayTest_CustomBuffering
time = 1903
bytes = 2120336424
ArrayTest_CustomBuffering_StaticBuffer
time = 1872
bytes = 2120336424
DataInputByteAtATime
time = 2668
bytes = 2120336424
DataInputReadFully
time = 2028
bytes = 2120336424
MemoryMapped
time = 2901
bytes = 2120336424
FileChannelArrayByteBuffer
time = 2371
bytes = 2120336424
FileChannelAllocateByteBuffer
time = 2356
bytes = 2120336424
FileChannelAllocateByteBuffer_StaticBuffer
time = 2668
bytes = 2120336424
FileChannelAllocateDirectByteBuffer
time = 2512
bytes = 2120336424
FileChannelAllocateDirectByteBuffer_StaticBuffer
time = 2590
bytes = 2120336424

My hacked version of TofuBear's code:

import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.MappedByteBuffer;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.HashSet;
import java.util.Set;
public class Main { 
    public static void main(final String[] argv)     { 
        ArrayTest.mainx(argv);
        ArrayTest.mainx(argv);
        ArrayTest_CustomBuffering.mainx(argv);
        ArrayTest_CustomBuffering_StaticBuffer.mainx(argv);
        DataInputByteAtATime.mainx(argv);
        DataInputReadFully.mainx(argv);
        MemoryMapped.mainx(argv);
        FileChannelArrayByteBuffer.mainx(argv);
        FileChannelAllocateByteBuffer.mainx(argv);
        FileChannelAllocateByteBuffer_StaticBuffer.mainx(argv);
        FileChannelAllocateDirectByteBuffer.mainx(argv);
        FileChannelAllocateDirectByteBuffer_StaticBuffer.mainx(argv);
     } 
 } 
abstract class Test { 
    static final int BUFF_SIZE = 20971520;
    static final byte[] StaticData = new byte[BUFF_SIZE];
    static final ByteBuffer StaticBuffer =ByteBuffer.allocate(BUFF_SIZE);
    static final ByteBuffer StaticDirectBuffer = ByteBuffer.allocateDirect(BUFF_SIZE);
    public final void run(final File root)     { 
        final Set<File> files;
        final long      size;
        final long      start;
        final long      end;
        final long      total;
        files = new HashSet<File>();
        getFiles(root, files);
        start = System.currentTimeMillis();
        size = readFiles(files);
        end = System.currentTimeMillis();
        total = end - start;
        System.out.println(getClass().getName());
        System.out.println("time  = " + total);
        System.out.println("bytes = " + size);
     } 
    private void getFiles(final File dir,final Set<File> files)     { 
        final File[] childeren;
        childeren = dir.listFiles();
        for(final File child : childeren)         { 
            if(child.isFile())             { 
                files.add(child);
             } 
            else             { 
                getFiles(child, files);
             } 
         } 
     } 
    private long readFiles(final Set<File> files)     { 
        long size;
        size = 0;
        for(final File file : files)         { 
            size += readFile(file);
         } 
        return (size);
     } 
    protected abstract long readFile(File file);
 } 
class ArrayTest    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new ArrayTest();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        InputStream stream;
        stream = null;
        try         { 
            final byte[] data;
            int          soFar;
            int          sum;
            stream = new BufferedInputStream(new FileInputStream(file));
            data   = new byte[(int)file.length()];
            soFar  = 0;
            do             { 
                soFar += stream.read(data, soFar, data.length - soFar);
             } 
            while(soFar != data.length);
            sum = 0;
            for(final byte b : data)             { 
                sum += b;
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 } 

 class ArrayTest_CustomBuffering    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new ArrayTest_CustomBuffering();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        InputStream stream;
        stream = null;
        try         { 
            final byte[] data;
            int          soFar;
            int          sum;
            stream = new FileInputStream(file);
            data   = new byte[(int)file.length()];
            soFar  = 0;
            do             { 
                soFar += stream.read(data, soFar, data.length - soFar);
             } 
            while(soFar != data.length);
            sum = 0;
            for(final byte b : data)             { 
                sum += b;
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 }

 class ArrayTest_CustomBuffering_StaticBuffer    extends Test { 



    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new ArrayTest_CustomBuffering_StaticBuffer();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        InputStream stream;
        stream = null;
        try         { 
            int          soFar;
            int          sum;
            final int    fileSize;
            stream = new FileInputStream(file);
            fileSize = (int)file.length();
            soFar  = 0;
            do             { 
                soFar += stream.read(StaticData, soFar, fileSize - soFar);
             } 
            while(soFar != fileSize);
            sum = 0;
            for(int i=0;i<fileSize;i++)             { 
                sum += StaticData[i];
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 }

class DataInputByteAtATime    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new DataInputByteAtATime();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        DataInputStream stream;
        stream = null;
        try         { 
            final int fileSize;
            int       sum;
            stream   = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
            fileSize = (int)file.length();
            sum      = 0;
            for(int i = 0; i < fileSize; i++)             { 
                sum += stream.readByte();
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 } 
class DataInputReadFully    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new DataInputReadFully();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        DataInputStream stream;
        stream = null;
        try         { 
            final byte[] data;
            int          sum;
            stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
            data   = new byte[(int)file.length()];
            stream.readFully(data);
            sum = 0;
            for(final byte b : data)             { 
                sum += b;
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 } 
class DataInputReadInChunks    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new DataInputReadInChunks();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        DataInputStream stream;
        stream = null;
        try         { 
            final byte[] data;
            int          size;
            final int    fileSize;
            int          sum;
            stream   = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
            fileSize = (int)file.length();
            data     = new byte[512];
            size     = 0;
            sum      = 0;
            do             { 
                size += stream.read(data);
                sum = 0;
                for(int i = 0;
 i < size;
 i++)                 { 
                    sum += data[i];
                 } 
             } 
            while(size != fileSize);
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 } 
class MemoryMapped    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new MemoryMapped();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        FileInputStream stream;
        stream = null;
        try         { 
            final FileChannel      channel;
            final MappedByteBuffer buffer;
            final int              fileSize;
            int                    sum;
            stream   = new FileInputStream(file);
            channel  = stream.getChannel();
            buffer   = channel.map(MapMode.READ_ONLY, 0, file.length());
            fileSize = (int)file.length();
            sum      = 0;

            for(int i = 0; i < fileSize; i++)             { 
                sum += buffer.get();
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 } 

 class FileChannelArrayByteBuffer    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new FileChannelArrayByteBuffer();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        FileInputStream stream;
        stream = null;
        try         { 
            final byte[] data;
            final FileChannel      channel;
            final ByteBuffer       buffer;
            int                    nRead=0;
            final int              fileSize;
            int                    sum;
            stream = new  FileInputStream(file);
            data   = new byte[(int)file.length()];
            buffer = ByteBuffer.wrap(data);

            channel  = stream.getChannel();
            fileSize = (int)file.length();
            nRead += channel.read(buffer);

            buffer.rewind();
            sum      = 0;
            for(int i = 0; i < fileSize; i++)             { 
                sum += buffer.get();
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 } 

 class FileChannelAllocateByteBuffer    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new FileChannelAllocateByteBuffer();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        FileInputStream stream;
        stream = null;
        try         { 
            final byte[] data;
            final FileChannel      channel;
            final ByteBuffer       buffer;
            int                    nRead=0;
            final int              fileSize;
            int                    sum;
            stream = new  FileInputStream(file);
            //data   = new byte[(int)file.length()];
            buffer = ByteBuffer.allocate((int)file.length());

            channel  = stream.getChannel();
            fileSize = (int)file.length();
            nRead += channel.read(buffer);

            buffer.rewind();
            sum      = 0;
            for(int i = 0; i < fileSize; i++)             { 
                sum += buffer.get();
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 } 

 class FileChannelAllocateDirectByteBuffer    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new FileChannelAllocateDirectByteBuffer();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        FileInputStream stream;
        stream = null;
        try         { 
            final byte[] data;
            final FileChannel      channel;
            final ByteBuffer       buffer;
            int                    nRead=0;
            final int              fileSize;
            int                    sum;
            stream = new  FileInputStream(file);
            //data   = new byte[(int)file.length()];
            buffer = ByteBuffer.allocateDirect((int)file.length());

            channel  = stream.getChannel();
            fileSize = (int)file.length();
            nRead += channel.read(buffer);

            buffer.rewind();
            sum      = 0;
            for(int i = 0; i < fileSize; i++)             { 
                sum += buffer.get();
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 }

 class FileChannelAllocateByteBuffer_StaticBuffer    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new FileChannelAllocateByteBuffer_StaticBuffer();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        FileInputStream stream;
        stream = null;
        try         { 
            final byte[] data;
            final FileChannel      channel;
            int                    nRead=0;
            final int              fileSize;
            int                    sum;
            stream = new  FileInputStream(file);
            //data   = new byte[(int)file.length()];
            StaticBuffer.clear();
            StaticBuffer.limit((int)file.length());
            channel  = stream.getChannel();
            fileSize = (int)file.length();
            nRead += channel.read(StaticBuffer);

            StaticBuffer.rewind();
            sum      = 0;
            for(int i = 0; i < fileSize; i++)             { 
                sum += StaticBuffer.get();
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 }

 class FileChannelAllocateDirectByteBuffer_StaticBuffer    extends Test { 
    public static void mainx(final String[] argv)     { 
        final Test test;
        test = new FileChannelAllocateDirectByteBuffer_StaticBuffer();
        test.run(new File(argv[0]));
     } 
    protected long readFile(final File file)     { 
        FileInputStream stream;
        stream = null;
        try         { 
            final byte[] data;
            final FileChannel      channel;
            int                    nRead=0;
            final int              fileSize;
            int                    sum;
            stream = new  FileInputStream(file);
            //data   = new byte[(int)file.length()];
            StaticDirectBuffer.clear();
            StaticDirectBuffer.limit((int)file.length());
            channel  = stream.getChannel();
            fileSize = (int)file.length();
            nRead += channel.read(StaticDirectBuffer);

            StaticDirectBuffer.rewind();
            sum      = 0;
            for(int i = 0; i < fileSize; i++)             { 
                sum += StaticDirectBuffer.get();
             } 
            return (sum);
         } 
        catch(final IOException ex)         { 
            ex.printStackTrace();
         } 
        finally         { 
            if(stream != null)             { 
                try                 { 
                    stream.close();
                 } 
                catch(final IOException ex)                 { 
                    ex.printStackTrace();
                 } 
             } 
         } 
        return (0);
     } 
 }
Share:
10,590
user63898
Author by

user63898

developer

Updated on June 17, 2022

Comments

  • user63898
    user63898 almost 2 years

    I need to read ~50 files on every server start and place each text file's representation into memory. Each text file will have its own string (which is the best type to use for the string holder?).

    What is the fastest way to read the files into memory, and what is the best data structure/type to hold the text in so that I can manipulate it in memory (search and replace mainly)?

    Thanks

  • Hosam Aly
    Hosam Aly about 15 years
    Interesting! Do you have a benchmark or comparison between both approaches?
  • TofuBeer
    TofuBeer about 15 years
    I had some code (long gone) that parsed every class file in rt.jar (6000+). Using FileInputStream (wrapped with a BufferedIputStream) it took 30 seconds, with a memorry mapped file it too 4. Other than the way the bytes were read there was no difference in the code.
  • TofuBeer
    TofuBeer about 15 years
    I did extract all of the files from the JAR to the file system before doing it.
  • user63898
    user63898 about 15 years
    if i load all the text files with Class.getResourceAsStream() how can i iterate throw the files inside the jar ?
  • eaolson
    eaolson about 15 years
    This will read bytes and not necessarily character data, though, correct?
  • TofuBeer
    TofuBeer about 15 years
    You can make use of CharBuffer via the ByteBuffer.asCharBuffer. Also the speed will be very OS dependant - nio integrates tightly to the underlying OS (updating my answer)
  • TofuBeer
    TofuBeer about 15 years
    java.util.ZipFile will let you work with files on a JAR (a JAR is just a zip file).
  • Aaron Digulla
    Aaron Digulla about 15 years
    Either use ZipFile or iterate over a list of filenames (instead of trying to iterate over the resource).
  • Aaron Digulla
    Aaron Digulla about 15 years
    @Tofu: Care to explain why using a big char buffer sounds like the worst way? If allocates memory only twice (once for char array and once to copy it into a String). Can't get more cheap than that.
  • Aaron Digulla
    Aaron Digulla about 15 years
    @Tofu: Also, I use a single command to read the whole file, so only one IO request. Your approach uses a lot of objects, MMU table changes, etc. I figure no matter the OS, that should be slower than a single file.read() call.
  • TofuBeer
    TofuBeer about 15 years
    Reading the whole file into a byte[] was the slowest in my tests (by a large amount). Also you need to do repetitive reads to get the whole array (read returns an int of how many were read, it may be less than the length of the array).
  • Hosam Aly
    Hosam Aly about 15 years
    @TofuBeer is right; you should loop checking how many bytes have actually been read.
  • kohlerm
    kohlerm about 15 years
    as you said. Memory mapping might take long on some OS's. So for small files it's probaly not a good idea
  • kohlerm
    kohlerm about 15 years
    using String.replaceAll() is definitely not a good idea. It will not replace Strings inp lace, but allocate new Strings.
  • Aaron Digulla
    Aaron Digulla about 15 years
    kohlerm: Since there is no way to modify a String in place in Java, it doesn't really matter how you do it. As I said: If the String is really large, use a StringBuilder instead.
  • Jason S
    Jason S about 15 years
    the times are suspicious to me. I'm not sure I'd trust comparisons of times that are "only" a few seconds in length; the JVMs need time to start up. I'd use larger files or more iterations. As tofubeer pointed out, you also need to include a loop of at least one additional iteration prior to starting the "real" timing, to prime the disk cache and also to let the JVM's JIT do its fancy work.
  • jkaufmann
    jkaufmann about 15 years
    I did keep the TofuBeer's "priming" copy as the first iteration in the test. The files were already cached from a previous run. I am re-running the test on an XP box and coming up with similar results. I am really perplexed by the speed decrease when reusing a static buffer vs. creating a new buffer for each file. Perhaps this is an optimization in the underlying JVM? Not sure.. I'd rather believe its an issue with the code written. I think I may troll through the underlying source in the JAVA API to see whats going on with the NIO channels vs the FileInputStreams. Thanks for the inout
  • rzwitserloot
    rzwitserloot almost 6 years
    Your test is just wrong, unfortunately. Specifically, your second ArrayTest takes 46570 msec, but your ReadFully tests only 8953; less than 20%. And yet, the code is the same. Look at the source of DataInputStream's readFully: It's the same as your arraytest code.
  • TofuBeer
    TofuBeer almost 6 years
    The DataInputStream's readFully has new DataInputStream the ArrayTest does not.
  • Erk
    Erk over 4 years
    One thing to look out for with the BufferedInputStream is the buffer size. Default is 8192, but I managed to get things much faster by setting it down considerably (I read very small files).