Convert large CSV files to JSON

14,808

Solution 1

You mentioned csvtojson module above and that is an open source project which I am maintaining.

I am sorry it did not work out for you and it was caused by a bug solved several months ago. I also added some extra lines in README for your scenario. Please check out Process Big CSV File in Command Line.

Please make sure you have the latest csvtojson release. (Currently it is 0.2.2)

You can update it by running

npm install -g csvtojson

After you've installed latest csvtojson, you just need to run:

csvtojson [path to bigcsvdata] > converted.json

This streams data from the csvfile. Or if you want to stream data from another application:

cat [path to bigcsvdata] | csvtojson > converted.json

They will output the same thing.

I have manually tested it with a csv file over 3 million records and it works without an issue.

I believe you just need a simple tool. The purpose of the lib is to relief stress like this. Please do let me know if you meet any problems next time so I could solve it in time.

Solution 2

The npm csv package is able to process a CSV stream, without having to store the full file in memory. You'll need to install node.js and csv (npm install csv). Here is a sample application, which will write JSON objects to a file:

var csv = require('csv')
var fs = require('fs')
var f = fs.createReadStream('Fielding.csv')
var w = fs.createWriteStream('out.txt')

w.write('[');

csv()
.from.stream(f, {columns:true})
.transform(function(row, index) {
    return (index === 0 ? '' : ',\n') + JSON.stringify(row);
})
.to.stream(w, {columns: true, end: false})
.on('end', function() {
     w.write(']');
     w.end();
 });

Please note the columns options, needed to keep the columns name in the JSON objects (otherwise you'll get a simple array) and the end options set to false, which tells node not to close the file stream when the CSV stream closes: this allows us to add the last ']'. The transform callback provides a way for your program to hook into the data stream and transform the data before it is written to the next stream.

Solution 3

When you work with such large dataset, you need to write streamed processing rather than load > convert > save. As loading such big thing - would not fit the memory.

CSV file it self is very simple and has little differences over formats. So you can write simple parser yourself. As well JSON is usually simple as well, and can be easily processed line by line without need of loading whole thing.

  1. createReadStream from CSV file.
  2. createWriteStream for new JSON file.
  3. on('data', ...) process read data: append to general string, and extract full lines if available.
  4. On the way if line/lines available from readStream, convert them to JSON objects and push into writeStream of new JSON file.

This is well doable with pipe and own pipe in the middle that will convert lines into objects to be written into new file.

This approach will allow to avoid loading the whole file into memory, but process it gradually with load part, process and write it and go forward slowly.

Solution 4

This should do the job.

npm i --save csv2json fs-extra // install the modules

const csv2json = require('csv2json');
const fs = require('fs-extra');

const source = fs.createReadStream(__dirname + '/data.csv');
const output = fs.createWriteStream(__dirname + '/result.json');
 source
   .pipe(csv2json())
   .pipe(output );
Share:
14,808

Related videos on Youtube

JVG
Author by

JVG

Updated on September 15, 2022

Comments

  • JVG
    JVG over 1 year

    I don't mind if this is done with a separate program, with Excel, in NodeJS or in a web app.

    It's exactly the same problem as described here:

    Large CSV to JSON/Object in Node.js

    It seems that the OP didn't get that answer to work (yet accepted it anyway?). I've tried working with it but can't seem to get it to work either.

    In short: I'm working with a ~50,000 row CSV and I want to convert it to JSON. I've tried just about every online "csv to json" webapp out there, all crash with this large of a dataset.

    I've tried many Node CSV to JSON modules but, again, they all crash. The csvtojson module seemed promising, but I got this error: FATAL ERROR: JS Allocation failed - process out of memory.

    What on earth can I do to get this data in a useable format? As above, I don't mind if it's an application, something that works within Excel, a webapp or a Node module, so long as I either get a .JSON file or an object that I can work with within Node.

    Any ideas?

  • JVG
    JVG over 10 years
    Sorry I'm a little bit late replying here. This is close, except the out.text that is created is not properly JSON formatted, rather it's just a file with rows of objects (it needs to have an [ at the start and ] at the end, as well as commas at the end of each line). If you edit to correct this I'll accept as the answer.
  • Amit Dalal
    Amit Dalal almost 9 years
    I am trying out csvtojson for a huge csv file (~5GB/11million rows). I've splitted the file into multiple files (each around 20MB/40k rows). Even if I am processing these files sequentially, the process keeps running but stops writing any more data to json file after processing about 50k rows. Any clues?
  • Keyang
    Keyang almost 9 years
    Could you paste some code on how you use it to process the CSV file? It should be ok even if you use the 5GB csv directly.
  • Amit Dalal
    Amit Dalal almost 9 years
    I am using the cli csvtojson --delimiter=## x.csv > y.json
  • Keyang
    Keyang almost 9 years
    what version of csvtojson you are using? update to latest one if you could..using >npm install -g csvtojson
  • Amit Dalal
    Amit Dalal almost 9 years
    from package.json: "version": "0.3.21"
  • Keyang
    Keyang almost 9 years
    It should be ok. this more looks like there is something weird in the csv data which causing parsing stopped. It would be helpful if you could share a sample csv file which cause the issue through google drive.
  • Amit Dalal
    Amit Dalal almost 9 years
  • Suresh
    Suresh over 4 years
    I've 80lac records in a file. This code help me to convert in seconds. Thanks @Bogadan
  • Suresh
    Suresh over 4 years
    You've added this code to the store in one file. Can you help me out to get into a variable to use in code?