How to return large amount of rows from mongodb using node.js http server?
Solution 1
The cursor.streamRecords()
method of the native MongoDB
driver is deprecated,
the method stream()
is faster.
I have parsed a 40,000,000 row document of acatalog without problems with Mongodb
+ stream()
+ process.nextTick()
Solution 2
I found out that node-mongodb-native Cursor object has a streaming option (used with collection.find().streamRecords()
) for the records too even if it's not mentioned in the github page of the driver. See the Cursor source code and search for "streamRecords".
In the end the code ended up like this:
db.collection('users', function(err, collection) {
var first = true;
response.setHeader("Content-Type", "application/json");
response.write('{"users" : [');
var stream = collection.find().streamRecords();
stream.on('data', function(item) {
var prefix = first ? '' : ', ';
response.write(prefix + JSON.stringify(item));
first = false;
});
stream.on('end', function() {
response.write(']}');
response.end();
});
});
Solution 3
Something like that should work. If it doesn't you should probably open an issue in the mongodb-native bug tracker.
http.createServer(function (request, response) {
db.collection('users', function(err, collection) {
collection.find({}, function(err, cursor){
response.setHeader("Content-Type", "application/json");
cursor.each(function(err, item) {
if (item) {
response.write(JSON.stringify(item));
} else {
response.end();
}
});
});
});
}).listen(8008);
PS: it's just a stub, i mean i dont remember the exact syntax, but it's each
function you're looking for.
Solution 4
Well, I no longer use mongodb native javascript driver, but in mongoose there is pretty good implementation of streams.
The syntax of the two drivers is pretty similar. You can do this with mongoose :
response.setHeader("Content-Type", "application/json");
var stream = collection.find().stream();
stream.on('data', function(doc) {
response.write(doc);
});
stream.on('close', function() {
response.end();
});
Solution 5
A little module to do that using Node's stream.Transform
class:
var stream = require('stream');
function createCursorStream(){
var cursorStream = new stream.Transform({objectMode:true});
cursorStream._transform = function(chunk,encoding,done){
if(cursorStream.started){
cursorStream.push(', ' + JSON.stringify(chunk));
}else{
cursorStream.push('[' + JSON.stringify(chunk));
cursorStream.started = true;
}
done();
};
cursorStream._flush = function(done){
cursorStream.push(']');
done();
};
return cursorStream;
}
module.exports.streamCursorToResponse = function(cursor,response){
cursor.stream().pipe(createCursorStream()).pipe(response);
};
You can alter JSON.Stringify
parts to do any other kind of "on the fly" transforms on the objects coming from mongodb cursor, and save some memory.
Timo Albert
Updated on June 12, 2022Comments
-
Timo Albert almost 2 years
I have a user database in mongodb which I would like to export via a REST interface in JSON. The problem is that in the worst case scenario the amount of returned rows is well over 2 million.
First I tried this
var mongo = require('mongodb'), Server = mongo.Server, Db = mongo.Db; var server = new Server('localhost', 27017, {auto_reconnect: true}); var db = new Db('tracking', server); var http = require('http'); http.createServer(function (request, response) { db.collection('users', function(err, collection) { collection.find({}, function(err, cursor){ cursor.toArray(function(err, items) { output = '{"users" : ' + JSON.stringify(items) + '}'; response.setHeader("Content-Type", "application/json"); response.end(output); }); }); }); }).listen(8008); console.log('Server running at localhost:8008');
which fails when running out of memory. The example uses node-mongodb-native driver and the basic http package.
FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory
(note that in real scenario I use parameters which limit the results as needed, but this example queries them all which is the worst case scenario regardless)
The data itself is simple, like
{ "_id" : ObjectId("4f993d1c5656d3320851aadb"), "userid" : "80ec39f7-37e2-4b13-b442-6bea57472537", "user-agent" : "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322)", "ip" : "127.0.0.1", "lastupdate" : 1335442716 }
I also tried something like
while(cursor != null) { cursor.nextObject(function(err, item) { response.write(JSON.stringify(item)); }); }
but that ran out of memory too.
How should I proceed? There should be a way to stream the data row by row, but I haven't been able to find a suitable example for it. Paging the data is out of the question because of external application requirements. I thought of writing the data to a file and then posting it, but that leads to unwanted io.
-
Timo Albert almost 12 yearsActually I tried that too, but it seems that
toArray
function in my original question actually wraps/uses thateach
function, so it failed when the script ran out of memory too. -
Timo Albert almost 12 yearsMongoose would be a better way to address the data storage thing altogether. Your answer lead me to right direction when using just this driver and I found out that node-mongodb-native has a streaming option too in Cursor called
streamResults
. I'll post a complete answer on my problem using just the node-mongodb-native later on. -
danmactough almost 12 yearsYes, toArray needs to buffer the entire array, so that won't help, but cursor.each will work. You just need to surround it with brackets.
-
Timo Albert almost 12 yearsNow that I tried this again it works too. For some reason it failed before and I have to back and check what I did wrong.
-
asuciu about 10 yearsThanks Timo for sharing your solution!
-
Meekohi about 10 yearsI found that
cursor.stream()
performs exactly the same ascursor.each()
. -
alexishacks almost 10 yearsBe sure to specify a value for the
batchSize
for thousands or millions of rows -
Giovanni Bitliner over 7 years@sha0Coder how big was the amount of time for the whole parsing?
-
Chitrank Dixit about 5 yearsHey @Timo, is there any way we can process streamed data in batches of 1000 from a dataset of over 100000.
-
Anand Sunderraman about 4 years@sha0Coder is there a gist you can post about how it was done ?