What is the fastest way to read a large number of small files into memory?
Solution 1
A memory mapped file will be fastest... something like this:
final File file;
final FileChannel channel;
final MappedByteBuffer buffer;
file = new File(fileName);
fin = new FileInputStream(file);
channel = fin.getChannel();
buffer = channel.map(MapMode.READ_ONLY, 0, file.length());
and then proceed to read from the byte buffer.
This will be significantly faster than FileInputStream
or FileReader
.
EDIT:
After a bit of investigation with this it turns out that, depending on your OS, you might be better off using a new BufferedInputStream(new FileInputStream(file))
instead. However reading the whole thing all at once into a char[] the size of the file sounds like the worst way.
So BufferedInputStream
should give roughly consistent performance on all platforms, while the memory mapped file may be slow or fast depending on the underlying OS. As with everything that is performance critical you should test your code and see what works best.
EDIT:
Ok here are some tests (the first one is done twice to get the files into the disk cache).
I ran it on the rt.jar class files, extracted to the hard drive, this is under Windows 7 beta x64. That is 16784 files with a total of 94,706,637 bytes.
First the results...
(remember the first is repeated to get the disk cache setup)
-
ArrayTest
- time = 83016
- bytes = 118641472
-
ArrayTest
- time = 46570
- bytes = 118641472
-
DataInputByteAtATime
- time = 74735
- bytes = 118641472
-
DataInputReadFully
- time = 8953
- bytes = 118641472
-
MemoryMapped
- time = 2320
- bytes = 118641472
Here is the code...
import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.HashSet;
import java.util.Set;
public class Main
{
public static void main(final String[] argv)
{
ArrayTest.main(argv);
ArrayTest.main(argv);
DataInputByteAtATime.main(argv);
DataInputReadFully.main(argv);
MemoryMapped.main(argv);
}
}
abstract class Test
{
public final void run(final File root)
{
final Set<File> files;
final long size;
final long start;
final long end;
final long total;
files = new HashSet<File>();
getFiles(root, files);
start = System.currentTimeMillis();
size = readFiles(files);
end = System.currentTimeMillis();
total = end - start;
System.out.println(getClass().getName());
System.out.println("time = " + total);
System.out.println("bytes = " + size);
}
private void getFiles(final File dir,
final Set<File> files)
{
final File[] childeren;
childeren = dir.listFiles();
for(final File child : childeren)
{
if(child.isFile())
{
files.add(child);
}
else
{
getFiles(child, files);
}
}
}
private long readFiles(final Set<File> files)
{
long size;
size = 0;
for(final File file : files)
{
size += readFile(file);
}
return (size);
}
protected abstract long readFile(File file);
}
class ArrayTest
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new ArrayTest();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
InputStream stream;
stream = null;
try
{
final byte[] data;
int soFar;
int sum;
stream = new BufferedInputStream(new FileInputStream(file));
data = new byte[(int)file.length()];
soFar = 0;
do
{
soFar += stream.read(data, soFar, data.length - soFar);
}
while(soFar != data.length);
sum = 0;
for(final byte b : data)
{
sum += b;
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputByteAtATime
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new DataInputByteAtATime();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
DataInputStream stream;
stream = null;
try
{
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++)
{
sum += stream.readByte();
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadFully
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new DataInputReadFully();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
DataInputStream stream;
stream = null;
try
{
final byte[] data;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
data = new byte[(int)file.length()];
stream.readFully(data);
sum = 0;
for(final byte b : data)
{
sum += b;
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadInChunks
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new DataInputReadInChunks();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
DataInputStream stream;
stream = null;
try
{
final byte[] data;
int size;
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
data = new byte[512];
size = 0;
sum = 0;
do
{
size += stream.read(data);
sum = 0;
for(int i = 0; i < size; i++)
{
sum += data[i];
}
}
while(size != fileSize);
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
class MemoryMapped
extends Test
{
public static void main(final String[] argv)
{
final Test test;
test = new MemoryMapped();
test.run(new File(argv[0]));
}
protected long readFile(final File file)
{
FileInputStream stream;
stream = null;
try
{
final FileChannel channel;
final MappedByteBuffer buffer;
final int fileSize;
int sum;
stream = new FileInputStream(file);
channel = stream.getChannel();
buffer = channel.map(MapMode.READ_ONLY, 0, file.length());
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++)
{
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex)
{
ex.printStackTrace();
}
finally
{
if(stream != null)
{
try
{
stream.close();
}
catch(final IOException ex)
{
ex.printStackTrace();
}
}
}
return (0);
}
}
Solution 2
The most efficient way is:
- Determine the length of the file (
File.length()
) - Create a char buffer with the same size (or slightly larger)
- Determine the encoding of the file
- Use
new InputStreamReader (new FileInputStream(file), encoding)
to read - Read the while file into the buffer with a single call to read(). Note that read() might return early (not having read the whole file). In that case, call it again with an offset to read the next batch.
- Create the string:
new String(buffer)
If you need to search&replace once at startup, use String.replaceAll().
If you need to do it repeatedly, you may consider using StringBuilder. It has no replaceAll() but you can use it to manipulate the character array in place (-> no allocation of memory).
That said:
- Make your code as short and simple as possible.
- Measure the performance
- It it's too slow, fix it.
There is no reason to waste a lot of time into making this code run fast if it takes just 0.1s to execute.
If you still have a performance problem, consider to put all the text files into a JAR, add it into the classpath and use Class.getResourceAsStream() to read the files. Loading things from the Java classpath is highly optimized.
Solution 3
It depends a lot on the internal structure of your text files and what you intend to do with them.
Are the files key-value dictionaries (i.e. "properties" files)? XML? JSON? You have standard structures for those.
If they have a formal structure you may also use JavaCC to build an object representation of the files.
Otherwise, if they are just blobs of data, well, read the files and put them in a String.
Edit: about search&replace- juste use String's replaceAll function.
Solution 4
After searching across google for for existing tests on IO speed in Java, I must say TofuBear's test case completely opened my eyes. You have to run his test on your own platform to see what is fastest for you.
After running his test, and adding a few of my own (Credit to TofuBear for posting his original code), it appears you may get even more speed by using your own custom buffer vs. using the BufferedInputStream.
To my dismay the NIO ByteBuffer did not perform well.
NOTE: The static byte[] buffer shaved off a few ms, but the static ByteBuffers actualy increased the time to process! Is there anything wrong with the code??
I added a few tests:
ArrayTest_CustomBuffering (Read data directly into my own buffer)
ArrayTest_CustomBuffering_StaticBuffer (Read Data into a static buffer that is created only once in the beginning)
FileChannelArrayByteBuffer (use NIO ByteBuffer and wrapping your own byte[] array)
FileChannelAllocateByteBuffer (use NIO ByteBuffer with .allocate)
FileChannelAllocateByteBuffer_StaticBuffer (same as 4 but with a static buffer)
FileChannelAllocateDirectByteBuffer (use NIO ByteBuffer with .allocateDirect)
FileChannelAllocateDirectByteBuffer_StaticBuffer (same as 6 but with a static buffer)
Here are my results:, using Windows Vista and jdk1.6.0_13 on the extracted rt.jar:
ArrayTest
time = 2075
bytes = 2120336424
ArrayTest
time = 2044
bytes = 2120336424
ArrayTest_CustomBuffering
time = 1903
bytes = 2120336424
ArrayTest_CustomBuffering_StaticBuffer
time = 1872
bytes = 2120336424
DataInputByteAtATime
time = 2668
bytes = 2120336424
DataInputReadFully
time = 2028
bytes = 2120336424
MemoryMapped
time = 2901
bytes = 2120336424
FileChannelArrayByteBuffer
time = 2371
bytes = 2120336424
FileChannelAllocateByteBuffer
time = 2356
bytes = 2120336424
FileChannelAllocateByteBuffer_StaticBuffer
time = 2668
bytes = 2120336424
FileChannelAllocateDirectByteBuffer
time = 2512
bytes = 2120336424
FileChannelAllocateDirectByteBuffer_StaticBuffer
time = 2590
bytes = 2120336424
My hacked version of TofuBear's code:
import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.MappedByteBuffer;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.HashSet;
import java.util.Set;
public class Main {
public static void main(final String[] argv) {
ArrayTest.mainx(argv);
ArrayTest.mainx(argv);
ArrayTest_CustomBuffering.mainx(argv);
ArrayTest_CustomBuffering_StaticBuffer.mainx(argv);
DataInputByteAtATime.mainx(argv);
DataInputReadFully.mainx(argv);
MemoryMapped.mainx(argv);
FileChannelArrayByteBuffer.mainx(argv);
FileChannelAllocateByteBuffer.mainx(argv);
FileChannelAllocateByteBuffer_StaticBuffer.mainx(argv);
FileChannelAllocateDirectByteBuffer.mainx(argv);
FileChannelAllocateDirectByteBuffer_StaticBuffer.mainx(argv);
}
}
abstract class Test {
static final int BUFF_SIZE = 20971520;
static final byte[] StaticData = new byte[BUFF_SIZE];
static final ByteBuffer StaticBuffer =ByteBuffer.allocate(BUFF_SIZE);
static final ByteBuffer StaticDirectBuffer = ByteBuffer.allocateDirect(BUFF_SIZE);
public final void run(final File root) {
final Set<File> files;
final long size;
final long start;
final long end;
final long total;
files = new HashSet<File>();
getFiles(root, files);
start = System.currentTimeMillis();
size = readFiles(files);
end = System.currentTimeMillis();
total = end - start;
System.out.println(getClass().getName());
System.out.println("time = " + total);
System.out.println("bytes = " + size);
}
private void getFiles(final File dir,final Set<File> files) {
final File[] childeren;
childeren = dir.listFiles();
for(final File child : childeren) {
if(child.isFile()) {
files.add(child);
}
else {
getFiles(child, files);
}
}
}
private long readFiles(final Set<File> files) {
long size;
size = 0;
for(final File file : files) {
size += readFile(file);
}
return (size);
}
protected abstract long readFile(File file);
}
class ArrayTest extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new ArrayTest();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
InputStream stream;
stream = null;
try {
final byte[] data;
int soFar;
int sum;
stream = new BufferedInputStream(new FileInputStream(file));
data = new byte[(int)file.length()];
soFar = 0;
do {
soFar += stream.read(data, soFar, data.length - soFar);
}
while(soFar != data.length);
sum = 0;
for(final byte b : data) {
sum += b;
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class ArrayTest_CustomBuffering extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new ArrayTest_CustomBuffering();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
InputStream stream;
stream = null;
try {
final byte[] data;
int soFar;
int sum;
stream = new FileInputStream(file);
data = new byte[(int)file.length()];
soFar = 0;
do {
soFar += stream.read(data, soFar, data.length - soFar);
}
while(soFar != data.length);
sum = 0;
for(final byte b : data) {
sum += b;
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class ArrayTest_CustomBuffering_StaticBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new ArrayTest_CustomBuffering_StaticBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
InputStream stream;
stream = null;
try {
int soFar;
int sum;
final int fileSize;
stream = new FileInputStream(file);
fileSize = (int)file.length();
soFar = 0;
do {
soFar += stream.read(StaticData, soFar, fileSize - soFar);
}
while(soFar != fileSize);
sum = 0;
for(int i=0;i<fileSize;i++) {
sum += StaticData[i];
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputByteAtATime extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new DataInputByteAtATime();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
DataInputStream stream;
stream = null;
try {
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += stream.readByte();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadFully extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new DataInputReadFully();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
DataInputStream stream;
stream = null;
try {
final byte[] data;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
data = new byte[(int)file.length()];
stream.readFully(data);
sum = 0;
for(final byte b : data) {
sum += b;
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class DataInputReadInChunks extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new DataInputReadInChunks();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
DataInputStream stream;
stream = null;
try {
final byte[] data;
int size;
final int fileSize;
int sum;
stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
fileSize = (int)file.length();
data = new byte[512];
size = 0;
sum = 0;
do {
size += stream.read(data);
sum = 0;
for(int i = 0;
i < size;
i++) {
sum += data[i];
}
}
while(size != fileSize);
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class MemoryMapped extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new MemoryMapped();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final FileChannel channel;
final MappedByteBuffer buffer;
final int fileSize;
int sum;
stream = new FileInputStream(file);
channel = stream.getChannel();
buffer = channel.map(MapMode.READ_ONLY, 0, file.length());
fileSize = (int)file.length();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelArrayByteBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelArrayByteBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
final ByteBuffer buffer;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
data = new byte[(int)file.length()];
buffer = ByteBuffer.wrap(data);
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(buffer);
buffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateByteBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateByteBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
final ByteBuffer buffer;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
buffer = ByteBuffer.allocate((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(buffer);
buffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateDirectByteBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateDirectByteBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
final ByteBuffer buffer;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
buffer = ByteBuffer.allocateDirect((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(buffer);
buffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += buffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateByteBuffer_StaticBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateByteBuffer_StaticBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
StaticBuffer.clear();
StaticBuffer.limit((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(StaticBuffer);
StaticBuffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += StaticBuffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
class FileChannelAllocateDirectByteBuffer_StaticBuffer extends Test {
public static void mainx(final String[] argv) {
final Test test;
test = new FileChannelAllocateDirectByteBuffer_StaticBuffer();
test.run(new File(argv[0]));
}
protected long readFile(final File file) {
FileInputStream stream;
stream = null;
try {
final byte[] data;
final FileChannel channel;
int nRead=0;
final int fileSize;
int sum;
stream = new FileInputStream(file);
//data = new byte[(int)file.length()];
StaticDirectBuffer.clear();
StaticDirectBuffer.limit((int)file.length());
channel = stream.getChannel();
fileSize = (int)file.length();
nRead += channel.read(StaticDirectBuffer);
StaticDirectBuffer.rewind();
sum = 0;
for(int i = 0; i < fileSize; i++) {
sum += StaticDirectBuffer.get();
}
return (sum);
}
catch(final IOException ex) {
ex.printStackTrace();
}
finally {
if(stream != null) {
try {
stream.close();
}
catch(final IOException ex) {
ex.printStackTrace();
}
}
}
return (0);
}
}
Comments
-
user63898 almost 2 years
I need to read ~50 files on every server start and place each text file's representation into memory. Each text file will have its own string (which is the best type to use for the string holder?).
What is the fastest way to read the files into memory, and what is the best data structure/type to hold the text in so that I can manipulate it in memory (search and replace mainly)?
Thanks
-
Hosam Aly about 15 yearsInteresting! Do you have a benchmark or comparison between both approaches?
-
TofuBeer about 15 yearsI had some code (long gone) that parsed every class file in rt.jar (6000+). Using FileInputStream (wrapped with a BufferedIputStream) it took 30 seconds, with a memorry mapped file it too 4. Other than the way the bytes were read there was no difference in the code.
-
TofuBeer about 15 yearsI did extract all of the files from the JAR to the file system before doing it.
-
user63898 about 15 yearsif i load all the text files with Class.getResourceAsStream() how can i iterate throw the files inside the jar ?
-
eaolson about 15 yearsThis will read bytes and not necessarily character data, though, correct?
-
TofuBeer about 15 yearsYou can make use of CharBuffer via the ByteBuffer.asCharBuffer. Also the speed will be very OS dependant - nio integrates tightly to the underlying OS (updating my answer)
-
TofuBeer about 15 yearsjava.util.ZipFile will let you work with files on a JAR (a JAR is just a zip file).
-
Aaron Digulla about 15 yearsEither use ZipFile or iterate over a list of filenames (instead of trying to iterate over the resource).
-
Aaron Digulla about 15 years@Tofu: Care to explain why using a big char buffer sounds like the worst way? If allocates memory only twice (once for char array and once to copy it into a String). Can't get more cheap than that.
-
Aaron Digulla about 15 years@Tofu: Also, I use a single command to read the whole file, so only one IO request. Your approach uses a lot of objects, MMU table changes, etc. I figure no matter the OS, that should be slower than a single file.read() call.
-
TofuBeer about 15 yearsReading the whole file into a byte[] was the slowest in my tests (by a large amount). Also you need to do repetitive reads to get the whole array (read returns an int of how many were read, it may be less than the length of the array).
-
Hosam Aly about 15 years@TofuBeer is right; you should loop checking how many bytes have actually been read.
-
kohlerm about 15 yearsas you said. Memory mapping might take long on some OS's. So for small files it's probaly not a good idea
-
kohlerm about 15 yearsusing String.replaceAll() is definitely not a good idea. It will not replace Strings inp lace, but allocate new Strings.
-
Aaron Digulla about 15 yearskohlerm: Since there is no way to modify a String in place in Java, it doesn't really matter how you do it. As I said: If the String is really large, use a StringBuilder instead.
-
Jason S about 15 yearsthe times are suspicious to me. I'm not sure I'd trust comparisons of times that are "only" a few seconds in length; the JVMs need time to start up. I'd use larger files or more iterations. As tofubeer pointed out, you also need to include a loop of at least one additional iteration prior to starting the "real" timing, to prime the disk cache and also to let the JVM's JIT do its fancy work.
-
jkaufmann about 15 yearsI did keep the TofuBeer's "priming" copy as the first iteration in the test. The files were already cached from a previous run. I am re-running the test on an XP box and coming up with similar results. I am really perplexed by the speed decrease when reusing a static buffer vs. creating a new buffer for each file. Perhaps this is an optimization in the underlying JVM? Not sure.. I'd rather believe its an issue with the code written. I think I may troll through the underlying source in the JAVA API to see whats going on with the NIO channels vs the FileInputStreams. Thanks for the inout
-
rzwitserloot almost 6 yearsYour test is just wrong, unfortunately. Specifically, your second ArrayTest takes 46570 msec, but your ReadFully tests only 8953; less than 20%. And yet, the code is the same. Look at the source of DataInputStream's readFully: It's the same as your arraytest code.
-
TofuBeer almost 6 yearsThe DataInputStream's readFully has new DataInputStream the ArrayTest does not.
-
Erk over 4 yearsOne thing to look out for with the BufferedInputStream is the buffer size. Default is 8192, but I managed to get things much faster by setting it down considerably (I read very small files).