Obtaining the hash of a file using the stream capabilities of crypto module (ie: without hash.update and hash.digest)

48,247

Solution 1

From the quoted snippet in the question:

[the Hash class] It is a stream that is both readable and writable. The written data is used to compute the hash. Once the writable side of the stream is ended, use the read() method to get the computed hash digest.

So what you need to hash some text is:

var crypto = require('crypto');

// change to 'md5' if you want an MD5 hash
var hash = crypto.createHash('sha1');

// change to 'binary' if you want a binary hash.
hash.setEncoding('hex');

// the text that you want to hash
hash.write('hello world');

// very important! You cannot read from the stream until you have called end()
hash.end();

// and now you get the resulting hash
var sha1sum = hash.read();

If you want to get the hash of a file, the best way is create a ReadStream from the file and pipe it into the hash:

var fs = require('fs');
var crypto = require('crypto');

// the file you want to get the hash    
var fd = fs.createReadStream('/some/file/name.txt');
var hash = crypto.createHash('sha1');
hash.setEncoding('hex');

fd.on('end', function() {
    hash.end();
    console.log(hash.read()); // the desired sha1sum
});

// read all file and pipe it (write it) to the hash object
fd.pipe(hash);

Solution 2

An ES6 version returning a Promise for the hash digest:

function checksumFile(hashName, path) {
  return new Promise((resolve, reject) => {
    const hash = crypto.createHash(hashName);
    const stream = fs.createReadStream(path);
    stream.on('error', err => reject(err));
    stream.on('data', chunk => hash.update(chunk));
    stream.on('end', () => resolve(hash.digest('hex')));
  });
}

Solution 3

Short version of Carlos' answer:

var fs = require('fs')
var crypto = require('crypto')

fs.createReadStream('/some/file/name.txt').
  pipe(crypto.createHash('sha1').setEncoding('hex')).
  on('finish', function () {
    console.log(this.read()) //the hash
  })

Solution 4

Further polish, ECMAScript 2015

hash.js:

'use strict';

function checksumFile(algorithm, path) {
  return new Promise(function (resolve, reject) {
    let fs = require('fs');
    let crypto = require('crypto');

    let hash = crypto.createHash(algorithm).setEncoding('hex');
    fs.createReadStream(path)
      .once('error', reject)
      .pipe(hash)
      .once('finish', function () {
        resolve(hash.read());
      });
  });
}

checksumFile('sha1', process.argv[2]).then(function (hash) {
  console.log('hash:', hash);
});
node hash.js hash.js
hash: 9c92ec7acf75f943aac66ca17427a4f038b059da

Works at least as early as v10.x:

node --version
v10.24.1

Solution 5

I use Node module hasha successfully, the code becomes very clean and short. It returns a promise, so you can use it with await:

const hasha = require('hasha');

const fileHash = await hasha.fromFile(yourFilePath, {algorithm: 'md5'});
Share:
48,247

Related videos on Youtube

Carlos Campderrós
Author by

Carlos Campderrós

Developer of whatever language is needed to make things work.

Updated on July 09, 2022

Comments

  • Carlos Campderrós
    Carlos Campderrós almost 2 years

    The crypto module of node.js (at the time of this writing at least) is not still deemed stable and so the API may change. In fact, the methods that everyone in the internet use to get the hash (md5, sha1, ...) of a file are considered legacy (from the documentation of Hash class) (note: emphasis mine):

    Class: Hash

    The class for creating hash digests of data.

    It is a stream that is both readable and writable. The written data is used to compute the hash. Once the writable side of the stream is ended, use the read() method to get the computed hash digest. The legacy update and digest methods are also supported.

    Returned by crypto.createHash.

    Despite hash.update and hash.digest being considered legacy, the example shown just above the quoted snippet is using them.

    What's the correct way of obtaining hashes without using those legacy methods?

  • sunnycmf
    sunnycmf over 10 years
    is it possible to get the data content from hash object?
  • Carlos Campderrós
    Carlos Campderrós over 10 years
    @sunnycmf what data do you mean? If you mean the original data you were hashing then I don't think so. If you mean the computed hash, then just use hash.read().
  • sunnycmf
    sunnycmf over 10 years
    yes i mean orig data, coz want to read the file once and get the data & sha1 hash.
  • Jacopofar
    Jacopofar over 8 years
    @sunnycmf you can use pipe twice: file.pipe(hash) and then file.pipe(outputStream)
  • Herman Kan
    Herman Kan about 8 years
    Instead of the fd.on('end'), it would be better to handle hash.on('finish'), which does not require calling hash.end().
  • Mörre
    Mörre almost 8 years
    Seems to me the pipe-to-Hash-object method is much more fickle than the old-fashioned stream.on('data', data => hash.update(data)) and later calling digest('hex'). When I did the exact experiment as above but left out hash.read() because I wanted to do that on the finish event of the write-stream I got nothing, no value. Only when I call hash.read() in the end event handler as above (or in finish suggested by @HermanKan) do I get the hash. No such problems with calling hash.update(...) in a data event handler, then I can get the hash output anywhere, any time.
  • f1lt3r
    f1lt3r almost 7 years
    I like this. Possible improvement here: gist.github.com/F1LT3R/2e4347a6609c3d0105afce68cd101561
  • Michael
    Michael over 6 years
    You should explain your code so that others can understand it
  • Timmmm
    Timmmm over 4 years
    Didn't work for me - needed to add this.end() before this.read().
  • Oleh Devua
    Oleh Devua about 4 years
    probably you mean .once('end', ...