Java: Calculate SHA-256 hash of large file efficiently

30,172

Solution 1

My explanation may not solve your problem since it depends a lot on your actual runtime environment, but when I run your code on my system, the throughput is limited by disk I/O and not the hash calculation. The problem is not solved by switching to NIO, but is simply caused by the fact that you're reading the file in very small pieces (16kB). Increasing the buffer size (buff) on my system to 1MB instead of 16kB more than doubles the throughput, but with >50MB/s, I am still limited by disk speed and not able to fully load a single CPU core.

BTW: You can simplify your implementation a lot by wrapping a DigestInputStream around a FileInputStream, read through the file and get the calculated hash from the DigestInputStream instead of manually shuffling the data from a RandomAccessFile to the MessageDigest as in your code.


I did a few performance tests with older Java versions and there seem to be a relevant difference between Java 5 and Java 6 here. I'm not sure though if the SHA implementation is optimized or if the VM is executing the code much faster. The throughputs I get with the different Java versions (1MB buffer) are:

  • Sun JDK 1.5.0_15 (client): 28MB/s, limited by CPU
  • Sun JDK 1.5.0_15 (server): 45MB/s, limited by CPU
  • Sun JDK 1.6.0_16 (client): 42MB/s, limited by CPU
  • Sun JDK 1.6.0_16 (server): 52MB/s, limited by disk I/O (85-90% CPU load)

I was a little bit curious on the impact of the assembler part in the CryptoPP SHA implementation, as the benchmarks results indicate that the SHA-256 algorithm only requires 15.8 CPU cycles/byte on an Opteron. I was unfortunately not able to build CryptoPP with gcc on cygwin (the build succeeded, but the generated exe failed immediately), but building a performance benchmark with VS2005 (default release configuration) with and without assembler support in CryptoPP and comparing to the Java SHA implementation on an in-memory buffer, leaving out any disk I/O, I get the following results on a 2.5GHz Phenom:

  • Sun JDK1.6.0_13 (server): 26.2 cycles/byte
  • CryptoPP (C++ only): 21.8 cycles/byte
  • CryptoPP (assembler): 13.3 cycles/byte

Both benchmarks compute the SHA hash of a 4GB empty byte array, iterating over it in chunks of 1MB, which are passed to MessageDigest#update (Java) or CryptoPP's SHA256.Update function (C++).

I was able to build and benchmark CryptoPP with gcc 4.4.1 (-O3) in a virtual machine running Linux and got only appr. half the throughput compared to the results from the VS exe. I am not sure how much of the difference is contributed to the virtual machine and how much is caused by VS usually producing better code than gcc, but I have no way to get any more exact results from gcc right now.

Solution 2

Perhaps the first thing today is work out where you are spending the most time? Can you run it through a profiler and see where the most time is being spent.

Possible improvements:

  1. Use NIO to read the file in the fastest possible way
  2. Update the Hash in a separate thread. This is actually rather hard to do and isn't for the faint hearted as it involves safe publishing between threads. But if your profiling shows a significant amount of time being spent in hash algorithm it may make better use of the disk.

Solution 3

I suggest you use a profiler like JProfiler or the one integrated in Netbeans (free) to find out, where the time is actually spent and concentrate on that part.

Just a wild guess - not sure if it will help - but have you tried the Server VM? Try starting the app with java -server and see if that helps you. The server VM is more aggressive compiling Java code to native than the default client VM is.

Solution 4

It used to be that Java ran about 10x slower than the same C++ code. Nowadays is closer to 2x slower. I think what your running into is just a fundamental part of Java. JVMs will get faster, especially as new JIT techniques are discovered, but you'll have a hard time out performing C.

Have you tried alternative JVMs and/or compilers? I used to get better performance with JRocket, but less stability. Ditto for using jikes over javac.

Solution 5

Since you apparently have a working C++ implementation which is fast, you could build a JNI bridge and use the actual C++ implementation or maybe you could try not reinventing the wheel, especially since it's a big one and use a premade library such as BouncyCastle which has been made to solve all cryptographic needs of your program.

Share:
30,172
stefita
Author by

stefita

code monkey

Updated on January 19, 2020

Comments

  • stefita
    stefita over 4 years

    I need to calculate a SHA-256 hash of a large file (or portion of it). My implementation works fine, but its much slower than the C++'s CryptoPP calculation (25 Min. vs. 10 Min for ~30GB file). What I need is a similar execution time in C++ and Java, so the hashes are ready at almost the same time. I also tried the Bouncy Castle implementation, but it gave me the same result. Here is how I calculate the hash:

    int buff = 16384;
    try {
        RandomAccessFile file = new RandomAccessFile("T:\\someLargeFile.m2v", "r");
    
        long startTime = System.nanoTime();
        MessageDigest hashSum = MessageDigest.getInstance("SHA-256");
    
        byte[] buffer = new byte[buff];
        byte[] partialHash = null;
    
        long read = 0;
    
        // calculate the hash of the hole file for the test
        long offset = file.length();
        int unitsize;
        while (read < offset) {
            unitsize = (int) (((offset - read) >= buff) ? buff : (offset - read));
            file.read(buffer, 0, unitsize);
    
            hashSum.update(buffer, 0, unitsize);
    
            read += unitsize;
        }
    
        file.close();
        partialHash = new byte[hashSum.getDigestLength()];
        partialHash = hashSum.digest();
    
        long endTime = System.nanoTime();
    
        System.out.println(endTime - startTime);
    
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
    
  • stefita
    stefita over 14 years
    As I wrote above, I also tryed using BouncyCastle's implementation SHA256Digest(), which turned out to be almoust as fast.
  • Esko
    Esko over 14 years
    Hm, somehow missed that, sorry.
  • Max A.
    Max A. over 14 years
    Good suggestion re: -server, but might be a moot point if the O.P. is using a 1.6 JVM on a 64-bit platform, where -server is the default.
  • stefita
    stefita over 14 years
    I preffer to keep the sun jvm, since most people have it already installed, but I might try the both compilers you mentioned. Thanks for the tip!
  • stefita
    stefita over 14 years
    thanks for the results! I tryed profiling it on my machine and it showed significant time spent on calculating of the hash, but increasing the buffer was a good idea and the execution time dropped with about 50sec for a 4GB file.
  • stefita
    stefita over 14 years
    np. But you may be right about using a wrapped C++ function since all improvements on the java side seem to decrease the speed by only 1/5 of the time.
  • jarnbjo
    jarnbjo over 14 years
    I must admit that the Java performance is still not quite the throughput achieved by the CryptoPP library, but looking into the CryptoPP sources, the SHA core is actually implemented in assembler. You are unlikely to achieve the same performance with a Java program, but if you increase the buffer size, use the server VM and/or upgrade to Java 6, the performance is perhaps good enough?
  • Cuper Hector
    Cuper Hector almost 13 years
    May I ask what tools did you use to measure the cpu cycles?
  • jarnbjo
    jarnbjo almost 13 years
    The number of cycles/byte is simply calculated from the actual time it took to hash 4GB of data.
  • Franklin
    Franklin almost 12 years
    Wow this parameter just boosted things up. Is this some kind of silver bullet ?
  • Daniel Schneller
    Daniel Schneller almost 12 years
    Not really a silver bullet, but over time I have learned to like the server VM even for "client" applications. Startup may take longer and memory consumption might be higher, but it's often worth it. On 64bit systems you get it by default, on 32bit you can choose, if a Java SDK is installed (JRE comes with client VM only).
  • Tyagi Akhilesh
    Tyagi Akhilesh about 8 years
    @jarnbjo Just loved the way you have diligently gone into details. Bravo!!