Node reading file in specified chunk size

node.js upload amazon-glacier

19,912

Solution 1

If nothing else you can just use fs.open(), fs.read(), and fs.close() manually. Example:

var CHUNK_SIZE = 10 * 1024 * 1024, // 10MB
    buffer = Buffer.alloc(CHUNK_SIZE),
    filePath = '/tmp/foo';

fs.open(filePath, 'r', function(err, fd) {
  if (err) throw err;
  function readNextChunk() {
    fs.read(fd, buffer, 0, CHUNK_SIZE, null, function(err, nread) {
      if (err) throw err;

      if (nread === 0) {
        // done reading file, do any necessary finalization steps

        fs.close(fd, function(err) {
          if (err) throw err;
        });
        return;
      }

      var data;
      if (nread < CHUNK_SIZE)
        data = buffer.slice(0, nread);
      else
        data = buffer;

      // do something with `data`, then call `readNextChunk();`
    });
  }
  readNextChunk();
});

Solution 2

You may consider using below snippet where we read file in chunk of 1024 bytes

var fs = require('fs');

var data = '';

var readStream = fs.createReadStream('/tmp/foo.txt',{ highWaterMark: 1 * 1024, encoding: 'utf8' });

readStream.on('data', function(chunk) {
    data += chunk;
    console.log('chunk Data : ')
    console.log(chunk);// your processing chunk logic will go here

}).on('end', function() {
    console.log('###################');
    console.log(data); 
// here you see all data processed at end of file
    });

Please Note : highWaterMark is the parameter used for chunk size Hope this Helps!

Web Reference: https://stackabuse.com/read-files-with-node-js/ Changing readstream chunksize

19,912

kjs3

Updated on August 16, 2022

Comments

kjs3 over 1 year

The goal: Upload large files to AWS Glacier without holding the whole file in memory.

I'm currently uploading to glacier now using fs.readFileSync() and things are working. But, I need to handle files larger than 4GB and I'd like to upload multiple chunks in parallel. This means moving to multipart uploads. I can choose the chunk size but then glacier needs every chunk to be the same size (except the last)

This thread suggests that I can set a chunk size on a read stream but that I'm not actually guaranteed to get it.

Any info on how I can get consistent parts without reading the whole file into memory and splitting it up manually?

Assuming I can get to that point I was just going to use cluster with a few processes pulling off the stream as fast as they can upload to AWS. If that seems like the wrong way to parallelize the work I'd love suggestions there.
jlyonsmith almost 7 years

While this answer is technically correct, using a regular file descriptor forgoes all of the stream event goodness which is really useful when you are trying to read/write a file in an async code base.
theycallmemorty over 4 years

Running this code today causes the following warning: "(node:25440) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead."
mscdex over 4 years

@theycallmemorty This answer was from well before it was deprecated. I've updated it now for modern versions of node.
Alexandre almost 4 years

Finally a good example of Node streams that doesn't look like way too abstract nonsense and doesn't bore me explaining what binary numbers are.
Poyoman over 3 years

You may prefer createReadStream solution which is more readable
mscdex over 3 years

@Poyoman This solution is the only way to guarantee chunk sizes, which is what the OP asked for. If you don't need specific chunk sizes, then yes, streaming is much easier.