Parse large JSON file in Nodejs and handle each object independently

28,501

There is a nice module named 'stream-json' that does exactly what you want.

It can parse JSON files far exceeding available memory.

and

StreamArray handles a frequent use case: a huge array of relatively small objects similar to Django-produced database dumps. It streams array components individually taking care of assembling them automatically.

Here is a very basic example:

const StreamArray = require('stream-json/streamers/StreamArray');
const path = require('path');
const fs = require('fs');

const jsonStream = StreamArray.withParser();

//You'll get json objects here
//Key is an array-index here
jsonStream.on('data', ({key, value}) => {
    console.log(key, value);
});

jsonStream.on('end', () => {
    console.log('All done');
});

const filename = path.join(__dirname, 'sample.json');
fs.createReadStream(filename).pipe(jsonStream.input);

If you'd like to do something more complex e.g. process one object after another sequentially (keeping the order) and apply some async operations for each of them then you could do the custom Writeable stream like this:

const StreamArray = require('stream-json/streamers/StreamArray');
const {Writable} = require('stream');
const path = require('path');
const fs = require('fs');

const fileStream = fs.createReadStream(path.join(__dirname, 'sample.json'));
const jsonStream = StreamArray.withParser();

const processingStream = new Writable({
    write({key, value}, encoding, callback) {
        //Save to mongo or do any other async actions

        setTimeout(() => {
            console.log(value);
            //Next record will be read only current one is fully processed
            callback();
        }, 1000);
    },
    //Don't skip this, as we need to operate with objects, not buffers
    objectMode: true
});

//Pipe the streams as follows
fileStream.pipe(jsonStream.input);
jsonStream.pipe(processingStream);

//So we're waiting for the 'finish' event when everything is done.
processingStream.on('finish', () => console.log('All done'));

Please note: The examples above are tested for '[email protected]'. For some previous versions (presumably proior to 1.0.0) you might have to:

const StreamArray = require('stream-json/utils/StreamArray');

and then

const jsonStream = StreamArray.make();

Share:
28,501

Related videos on Youtube

Lixing Liang
Author by

Lixing Liang

Updated on July 09, 2022

Comments

  • Lixing Liang
    Lixing Liang almost 2 years

    I need to read a large JSON file (around 630MB) in Nodejs and insert each object to MongoDB.

    I've read the answer here:Parse large JSON file in Nodejs.

    However, answers there are handling the JSON file line by line, instead of handling it object by object. Thus, I still don't know how to get an object from this file and operate it.

    I have about 100,000 this kind of objects in my JSON file.

    Data Format:

    [
      {
        "id": "0000000",
        "name": "Donna Blak",
        "livingSuburb": "Tingalpa",
        "age": 53,
        "nearestHospital": "Royal Children's Hospital",
        "treatments": {
            "19890803": {
                "medicine": "Stomach flu B",
                "disease": "Stomach flu"
            },
            "19740112": {
                "medicine": "Progeria C",
                "disease": "Progeria"
            },
            "19830206": {
                "medicine": "Poliomyelitis B",
                "disease": "Poliomyelitis"
            }
        },
        "class": "patient"
      },
     ...
    ]
    

    Cheers,

    Alex

  • Lixing Liang
    Lixing Liang about 7 years
    It works, thanks a lot. However, the efficiency is not perfect. It took 130s to read the whole json file with 100,000 objects. Do you think there is a more efficient way to do it?
  • Levarne Sobotker
    Levarne Sobotker almost 7 years
    Stream-json has to be the best json stream reader by far. It saved me from creating my own stream and picking out each object. Thanks for this answer. I had the same issue where i was running out of memory and the only solution was to stream each object at a time.
  • Kannaiyan
    Kannaiyan over 6 years
    @LixingLiang Split the single file into mutiple files and process them parallely, that will give you more efficiency. Bottle neck could be IO operations while reading the file.
  • PrestonDocks
    PrestonDocks over 5 years
    These examples seem to be out of date. I was getting errors with the require and to change to const StreamArray = require('stream-json/streamers/StreamArray') but then I get error TypeError: Cannot read property 'on' of undefined.
  • Eugene Lazutkin
    Eugene Lazutkin over 5 years
    @PrestonDocks The package was updated in 2018 with a major version jump (better perf, more utility). Either use a previous major version or read the new doc and update your code accordingly.
  • Antonio Narkevich
    Antonio Narkevich over 5 years
    @PrestonDocks, updated the answer, would you mind to take a look.
  • Antonio Narkevich
    Antonio Narkevich over 5 years
    @EugeneLazutkin updated the answer, check it out please, if still interested.
  • Rajeev Ranjan
    Rajeev Ranjan over 4 years
    how to add depth level while streaming. with above example it only stream with depth level 3. after that i get only type (array or object). like below: "mentions: [Array]"