Fastest Way To Read and Write Large Files Line By Line in Java

70,836

Solution 1

I suspect your real problem is that you have limited hardware and what you do is software won't make much difference. If you have plenty of memory and CPU, more advanced tricks can help, but if you are just waiting on your hard drive because the file is not cached, it won't make much difference.

BTW: 500 MB in 10 secs or 50 MB/sec is a typical read speed for a HDD.

Try running the following to see at what point your system is unable to cache the file efficiently.

public static void main(String... args) throws IOException {
    for (int mb : new int[]{50, 100, 250, 500, 1000, 2000})
        testFileSize(mb);
}

private static void testFileSize(int mb) throws IOException {
    File file = File.createTempFile("test", ".txt");
    file.deleteOnExit();
    char[] chars = new char[1024];
    Arrays.fill(chars, 'A');
    String longLine = new String(chars);
    long start1 = System.nanoTime();
    PrintWriter pw = new PrintWriter(new FileWriter(file));
    for (int i = 0; i < mb * 1024; i++)
        pw.println(longLine);
    pw.close();
    long time1 = System.nanoTime() - start1;
    System.out.printf("Took %.3f seconds to write to a %d MB, file rate: %.1f MB/s%n",
            time1 / 1e9, file.length() >> 20, file.length() * 1000.0 / time1);

    long start2 = System.nanoTime();
    BufferedReader br = new BufferedReader(new FileReader(file));
    for (String line; (line = br.readLine()) != null; ) {
    }
    br.close();
    long time2 = System.nanoTime() - start2;
    System.out.printf("Took %.3f seconds to read to a %d MB file, rate: %.1f MB/s%n",
            time2 / 1e9, file.length() >> 20, file.length() * 1000.0 / time2);
    file.delete();
}

On a Linux machine with lots of memory.

Took 0.395 seconds to write to a 50 MB, file rate: 133.0 MB/s
Took 0.375 seconds to read to a 50 MB file, rate: 140.0 MB/s
Took 0.669 seconds to write to a 100 MB, file rate: 156.9 MB/s
Took 0.569 seconds to read to a 100 MB file, rate: 184.6 MB/s
Took 1.585 seconds to write to a 250 MB, file rate: 165.5 MB/s
Took 1.274 seconds to read to a 250 MB file, rate: 206.0 MB/s
Took 2.513 seconds to write to a 500 MB, file rate: 208.8 MB/s
Took 2.332 seconds to read to a 500 MB file, rate: 225.1 MB/s
Took 5.094 seconds to write to a 1000 MB, file rate: 206.0 MB/s
Took 5.041 seconds to read to a 1000 MB file, rate: 208.2 MB/s
Took 11.509 seconds to write to a 2001 MB, file rate: 182.4 MB/s
Took 9.681 seconds to read to a 2001 MB file, rate: 216.8 MB/s

On a windows machine with lots of memory.

Took 0.376 seconds to write to a 50 MB, file rate: 139.7 MB/s
Took 0.401 seconds to read to a 50 MB file, rate: 131.1 MB/s
Took 0.517 seconds to write to a 100 MB, file rate: 203.1 MB/s
Took 0.520 seconds to read to a 100 MB file, rate: 201.9 MB/s
Took 1.344 seconds to write to a 250 MB, file rate: 195.4 MB/s
Took 1.387 seconds to read to a 250 MB file, rate: 189.4 MB/s
Took 2.368 seconds to write to a 500 MB, file rate: 221.8 MB/s
Took 2.454 seconds to read to a 500 MB file, rate: 214.1 MB/s
Took 4.985 seconds to write to a 1001 MB, file rate: 210.7 MB/s
Took 5.132 seconds to read to a 1001 MB file, rate: 204.7 MB/s
Took 10.276 seconds to write to a 2003 MB, file rate: 204.5 MB/s
Took 9.964 seconds to read to a 2003 MB file, rate: 210.9 MB/s

Solution 2

The first thing I would try is to increase the buffer size of the BufferedReader and BufferedWriter. The default buffer sizes are not documented, but at least in the Oracle VM they are 8192 characters, which won't bring much performance advantage.

If you only need to make a copy of the file (and don't need actual access to the data), I would either drop the Reader/Writer approach and work directly with InputStream and OutputStream using a byte array as buffer:

FileInputStream fis = new FileInputStream("d:/test.txt");
FileOutputStream fos = new FileOutputStream("d:/test2.txt");
byte[] b = new byte[bufferSize];
int r;
while ((r=fis.read(b))>=0) {
    fos.write(b, 0, r);         
}
fis.close();
fos.close();

or actually use NIO:

FileChannel in = new RandomAccessFile("d:/test.txt", "r").getChannel();
FileChannel out = new RandomAccessFile("d:/test2.txt", "rw").getChannel();
out.transferFrom(in, 0, Long.MAX_VALUE);
in.close();
out.close();

When benchmarking the different copy methods, I have however much larger differences (duration) between each run of the benchmark than between the different implementations. I/O caching (both on the OS level and the hard disk cache) plays a great role here and it is very difficult to say what is faster. On my hardware, copying a 1GB text file line by line using BufferedReader and BufferedWriter takes less than 5s in some runs and more than 30s in other.

Solution 3

In Java 7 you can use Files.readAllLines() and Files.write() methods. Here is the example:

List<String> readTextFile(String fileName) throws IOException {
    Path path = Paths.get(fileName);
    return Files.readAllLines(path, StandardCharsets.UTF_8);
}

void writeTextFile(List<String> strLines, String fileName) throws IOException {
    Path path = Paths.get(fileName);
    Files.write(path, strLines, StandardCharsets.UTF_8);
}

Solution 4

I would recommend looking at the classes in the java.nio package. Non-blocking IO might be faster for sockets:

http://docs.oracle.com/javase/6/docs/api/java/nio/package-summary.html

This article has benchmarks that say it's true:

http://vanillajava.blogspot.com/2010/07/java-nio-is-faster-than-java-io-for.html

Solution 5

I have a written an extensive article about the many ways of reading files in Java and testing them against each other with sample files from 1KB to 1GB and I have found the following 3 methods were the fastest for reading 1GB files:

1) java.nio.file.Files.readAllBytes() - took just under 1 second to read a 1 GB test file.

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;

public class ReadFile_Files_ReadAllBytes {
  public static void main(String [] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    File file = new File(fileName);

    byte [] fileBytes = Files.readAllBytes(file.toPath());
    char singleChar;
    for(byte b : fileBytes) {
      singleChar = (char) b;
      System.out.print(singleChar);
    }
  }
}

2) java.nio.file.Files.lines() - took about 3.5 seconds to read in a 1 GB test file.

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;

public class ReadFile_Files_Lines {
  public static void main(String[] pArgs) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    File file = new File(fileName);

    try (Stream linesStream = Files.lines(file.toPath())) {
      linesStream.forEach(line -&gt; {
        System.out.println(line);
      });
    }
  }
}

3) java.io.BufferedReader - took about 4.5 seconds to read a 1 GB test file.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class ReadFile_BufferedReader_ReadLine {
  public static void main(String [] args) throws IOException {
    String fileName = "c:\\temp\\sample-10KB.txt";
    FileReader fileReader = new FileReader(fileName);

    try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
      String line;
      while((line = bufferedReader.readLine()) != null) {
        System.out.println(line);
      }
    }
  }
}
Share:
70,836
user1785771
Author by

user1785771

Updated on July 09, 2022

Comments

  • user1785771
    user1785771 almost 2 years

    I have been searching a lot for the fastest way to read and write again a large files (0.5 - 1 GB) in java with limited memory (about 64MB). Each line in the file represents a record, so I need to get them line by line. The file is a normal text file.

    I tried BufferedReader and BufferedWriter but it doesn't seem to be the best option. It takes about 35 seconds to read and write a file of size 0.5 GB, only read write with no processing. I think the bottleneck here is writing as reading alone takes about 10 seconds.

    I tried to read array of bytes, but then searching for lines in each array that was read takes more time.

    Any suggestions please? Thanks

  • user1785771
    user1785771 over 11 years
    I looked into nio , but it only allows to read arrays or buffers from files. Processing this array to extract lines take longer time.
  • jarnbjo
    jarnbjo over 11 years
    The results of such a benchmark are more or less useless. First of all, when writing a file, closing the output stream does not ensure that all data has been written physically to the disk. It may still lurk around in a memory buffer on the OS level or on the harddisk. If you read the exact same file directly after you have written it, the data will most likely be read from a memory buffer and not physically from disk. According to this benchmark, my laptop HDD comes close to 500MB/s both for reading and writing and that is probably somewhere around 10x the true performance.
  • jarnbjo
    jarnbjo over 11 years
    The article has a chart, but I can't find anything about what was actually measured. The only situation where I've seen a performance advantage with NIO is when using a direct byte buffer for copying data between NIO channels. In which case, accessing the data from Java code is however much slower.
  • user1785771
    user1785771 over 11 years
    I tried the code and the speed is fast. I wanted to post here but I don't know how to format the code, do I use <code> tags? @Peter
  • duffymo
    duffymo over 11 years
    Your comments are terrific, jambjo. It'd be great to see an answer from you, since you're obviously knowledgable.
  • Vishy
    Vishy over 11 years
    @jarnbjo If I didn't make myself clear as to what the benchmark is doing, it is testing what you can do in software writing and reading to your disk cache. It has no idea whether the data is written to disk or even if you have a HDD. If you get lower results than these it is because of a limitation in your hardware.
  • Vishy
    Vishy over 11 years
    @user1785771 If you want to update your question, you can add code there under a heading like Edit: in replay to @PeterLawrey's answer ....
  • user1785771
    user1785771 over 11 years
    Thanks all for your feedback. So, can I conclude that BufferedReader is good enough in terms of speed?
  • user1785771
    user1785771 over 11 years
    Thanks, but I have limited memory so I can't really use FileChannel approach.
  • jarnbjo
    jarnbjo over 11 years
    Why not? What has available memory to do with using FileChannel?
  • user1785771
    user1785771 over 11 years
    Actually, I need to process the file before making a copy out of it.
  • jarnbjo
    jarnbjo over 11 years
    So why did you write that you are not processing the file (only read/write, no procesing)?