Read and parse CSV file in S3 without downloading the entire file
Solution 1
You should just be able to use the createReadStream
method and pipe it into fast-csv:
const s3Stream = s3.getObject(params).createReadStream()
require('fast-csv').fromStream(s3Stream)
.on('data', (data) => {
// do something here
})
Solution 2
I do not have enough reputation to comment but as of now the accepted answer method "fromStream" does not exist for 'fast-csv'. Now you'll need to use the parseStream method:
const s3Stream = s3.getObject(params).createReadStream()
require('fast-csv').parseStream(s3Stream)
.on('data', (data) => {
// use rows
})
Solution 3
For me, the answer that solved my issue was,
const csv = require('@fast-csv/parse');
const params = {
Bucket: srcBucket,
Key: srcKey,
};
const csvFile = s3.getObject(params).createReadStream();
let parserFcn = new Promise((resolve, reject) => {
const parser = csv
.parseStream(csvFile, { headers: true })
.on("data", function (data) {
console.log('Data parsed: ', data);
})
.on("end", function () {
resolve("csv parse process finished");
})
.on("error", function () {
reject("csv parse process failed");
});
});
try {
await parserFcn;
} catch (error) {
console.log("Get Error: ", error);
}
changingrainbows
Updated on January 14, 2021Comments
-
changingrainbows over 3 years
Using node.js, with the intention of running this module as an AWS Lambda function.
Using
s3.getObject()
fromaws-sdk
, I am able to successfully pick up a very large CSV file from Amazon S3. The intention is to read each line in the file and emit an event with the body of each line.In all examples I could find, it looks like the entire CSV file in S3 has to be buffered or streamed, converted to a string and then read line by line.
s3.getObject(params, function(err, data) { var body = data.Body.toString('utf-8'); }
This operation takes a very long time, given the size of the source CSV file. Also, the CSV rows are of varying length, and I'm not certain if I can use the buffer size as an option.
Question
Is there a way to pick up the S3 file in node.js and read/transform it line by line, which avoids stringifying the entire file in memory first?
Ideally, I'd prefer to use the better capabilities of
fast-csv
and/ornode-csv
, instead of looping manually. -
Deepak G M almost 5 yearsThis works pretty well. Just to add to it, if you are interested to know when the parsing ends, add
.on('end' () => { //Your handling })
-
ChristoKiwi almost 5 years@DeepakGM you forgot a comma
.on('end', () => { })
-
Hoon over 3 yearsThanks for adding this. I was looking for it. :)
-
Kai Durai over 3 yearsThis method is deprecated, see my answer below for using parseStream method instead
-
pta over 2 yearswhat the version of the lib contain
fromStream
? parseStream throws an Error....