In MongoDB is it practical to keep all comments for a post in one document?

mongodb

11,041

Solution 1

Posting this answer after some the others so I will repeat some of the things mentioned. Please accept the first suitable answer rather than this one.

That said there are a few things to take into account. Consider these three questions :

Will you always require all comments every time you query for a post?
Will you want to query on comments directly (e.g. query comments for a specific user)?
Will your system have relatively low usage?

If all questions can be answered with yes then you can embed the comments array. In all other scenarios you will probably need a seperate collection to store your comments.

First of all, you can actually update and remove comments atomically in a concurrency safe way (see updates with positional operators) but there are some things you cannot do such as index based inserts.

The main concern with using embedded arrays for any sort of large collection is the move-on-update issue. MongoDB reserves a certain amount of padding (see db.col.stats().paddingFactor) per document to allow it to grow as needed. If it runs out of this padding (and it will often in your usecase) it will have to move that ever growing document around on the disk. This makes updates an order of magnitude slower and is therefore a serious concern on high bandwidth servers. A related but slightly less vital issue is bandwidth. If you have no choice but to query the entire post with all its comments even though you're only displaying the first 10 you're going to waste quite a bit of bandwidth which can be an issue on cloud environments especially (you can use $slice to avoid some of this).

If you do want to go embedded here are your basic ops :

Add comment :

db.posts.update({_id:[POST ID]}, {$push:{comments:{commentId:"remon-923982", author:"Remon", text:"Hi!"}}})

Update comment :

 db.posts.update({_id:[POST ID], 'comments.commentId':"remon-923982"}, {$set:{'comments.$.text':"Hello!"}})

Remove comment

db.posts.update({_id:[POST ID], 'comments.commentId':"remon-923982"}, {$pull:{comments:{commentId:"remon-923982"}}})

All these methods are concurrency safe because the update criteria are part of the (process wide) write lock.

With all that said you probably want a dedicated collection for your comments but that comes with a second choice. You can either store each comment in a dedicated document or use comment buckets of, say, 20-30 comments each (described in detail here http://www.10gen.com/presentations/mongosf2011/schemascale). This has advantages and disadvantages so it's up to you to see which approach fits best for what you want to do. I would go for buckets if your comments per post can exceed a couple of hundred due to the o(N) performance of the skip(N) cursor method you'll need for paging them. In all other cases just go with a comment per document approach. That's most flexible with querying on comments for other use cases as well.

Solution 2

It greatly depends on the operations you want to allow, but a separate collection is usually better.

For instance, if you want to allow users to edit or delete comments, it is a very good idea to store comments in a separate collection, because these operations are hard or impossible to express w/ atomic modifiers alone, and state management becomes painful. The documentation also covers this.

A key issue w/ embedding comments is that you will have different writers. Normally, a blog post can be modified only by blog authors. With embedded comments, a reader also gets write access to the object, so to speak.

Code like this will be dangerous:

post = db.findArticle( { "_id" : 2332 } );
post.Text = "foo";
// in this moment, someone does a $push on the article's comments
db.update(post);
// now, we've deleted that comment

Solution 3

For performance reasons it is best to avoid documents that can grow in size over time:

Padding Factors:

"When you update a document in MongoDB, the update occurs in-place if the document has not grown in size. If the document did grow in size, however, then it might need to be relocated on disk to find a new disk location with enough contiguous space to fit the new larger document. This can lead to problems for write performance if the collection has many indexes since a move will require updating all the indexes for the document."

http://www.mongodb.org/display/DOCS/Padding+Factor

Solution 4

If you always retrieve a post with all its comments, why not?

If you don't, or you retrieve comments in a query other than by post (ie. view all of a user's comments on the user's page), then probably not since queries would become much more complicated.

View more solutions

11,041

Roman

Updated on September 15, 2022

Comments

Roman over 1 year
I've read in description of Document based dbs you can for example embed all comments under a post in the same document as the post if you choose to like so:
```
{
   _id = sdfdsfdfdsf,
   title = "post title"
   body = "post body"
   comments = [
      "comment 1 ......................................... end of comment"
           .
           .
           n
   ]
}
```
I'm having situation similar where each comment could be as large as 8KB and there could be as many as 30 of them per post.

Even though it's convenient to embed comments in the same document I wonder if having large documents impact performance especially when MongoDb server and http server run on separate machines and must communicate though a LAN?
Mark Hillick almost 12 years

+1 for the separate collection. The common answer, to this question, is to store comments within the same collection, with most folk citing the size of "War and Peace" in relation to the amount of content you can easily store. See the MongoDB user group for many discussions on this - groups.google.com/forum/?fromgroups#!searchin/mongodb-user/…‌. BTW, the current document size limit is 16mb.
mnemosyn almost 12 years

Yes, a 16MB limit is rather theoretical anyhow. If you really have that many comments, you need a full-blown comment system or a forum - nobody can find their way through 13,000 comments. But schema design has implications on code structure, and people tend to disregard that.
Remon van Vliet almost 12 years

I'm not sure if I agree with the arguments for this answer. The code example is not how you'd update a comment in an embedded post (you can do so safely with a positional operator based update). However, I do think a seperate comments collections (or more typical for actual real world solutions, a comment bucket approach) is the way to go. The main reason people are suggesting embedded comment arrays for this sort of use case is that you only have to do one query to get the blog post and the comments. However, in most real world scenario that's not actually a performance benefit.
Remon van Vliet almost 12 years

Glad someone brings up the bandwidth issue. This is a real world problem that is often ignored by people theorycrafting solutions for this sort of problem ;). The only exception is if you always have to retrieve all comments for each blog post you're retrieving from the system.
mnemosyn almost 12 years

Very true. What I wanted to show with the code (but didn't explain) is that the update of the article itself (if the author changes the blog text) needs a bit more caution, because a naive replace would not be atomic. I also keep forgetting the positional operator, which renders my first point moot.