Node reading file in specified chunk size
Solution 1
If nothing else you can just use fs.open()
, fs.read()
, and fs.close()
manually. Example:
var CHUNK_SIZE = 10 * 1024 * 1024, // 10MB
buffer = Buffer.alloc(CHUNK_SIZE),
filePath = '/tmp/foo';
fs.open(filePath, 'r', function(err, fd) {
if (err) throw err;
function readNextChunk() {
fs.read(fd, buffer, 0, CHUNK_SIZE, null, function(err, nread) {
if (err) throw err;
if (nread === 0) {
// done reading file, do any necessary finalization steps
fs.close(fd, function(err) {
if (err) throw err;
});
return;
}
var data;
if (nread < CHUNK_SIZE)
data = buffer.slice(0, nread);
else
data = buffer;
// do something with `data`, then call `readNextChunk();`
});
}
readNextChunk();
});
Solution 2
You may consider using below snippet where we read file in chunk of 1024 bytes
var fs = require('fs');
var data = '';
var readStream = fs.createReadStream('/tmp/foo.txt',{ highWaterMark: 1 * 1024, encoding: 'utf8' });
readStream.on('data', function(chunk) {
data += chunk;
console.log('chunk Data : ')
console.log(chunk);// your processing chunk logic will go here
}).on('end', function() {
console.log('###################');
console.log(data);
// here you see all data processed at end of file
});
Please Note : highWaterMark is the parameter used for chunk size Hope this Helps!
Web Reference: https://stackabuse.com/read-files-with-node-js/ Changing readstream chunksize
Related videos on Youtube
kjs3
Updated on August 16, 2022Comments
-
kjs3 over 1 year
The goal: Upload large files to AWS Glacier without holding the whole file in memory.
I'm currently uploading to glacier now using fs.readFileSync() and things are working. But, I need to handle files larger than 4GB and I'd like to upload multiple chunks in parallel. This means moving to multipart uploads. I can choose the chunk size but then glacier needs every chunk to be the same size (except the last)
This thread suggests that I can set a chunk size on a read stream but that I'm not actually guaranteed to get it.
Any info on how I can get consistent parts without reading the whole file into memory and splitting it up manually?
Assuming I can get to that point I was just going to use cluster with a few processes pulling off the stream as fast as they can upload to AWS. If that seems like the wrong way to parallelize the work I'd love suggestions there.
-
jlyonsmith almost 7 yearsWhile this answer is technically correct, using a regular file descriptor forgoes all of the stream event goodness which is really useful when you are trying to read/write a file in an async code base.
-
theycallmemorty over 4 yearsRunning this code today causes the following warning: "(node:25440) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead."
-
mscdex over 4 years@theycallmemorty This answer was from well before it was deprecated. I've updated it now for modern versions of node.
-
Alexandre almost 4 yearsFinally a good example of Node streams that doesn't look like way too abstract nonsense and doesn't bore me explaining what binary numbers are.
-
Poyoman over 3 yearsYou may prefer createReadStream solution which is more readable
-
mscdex over 3 years@Poyoman This solution is the only way to guarantee chunk sizes, which is what the OP asked for. If you don't need specific chunk sizes, then yes, streaming is much easier.