Choosing MongoDb/CouchDb/RavenDb - performance and scalability advice

18,515

Solution 1

if "20,000 concurrent writes" means inserts then I would go for CouchDB and use "_changes" api for triggers. But with 20.000 writes you would need a stable sharding aswell. Then you would better take a look at bigcouch

And if "20.000" concurrent writes consist "mostly" updates I would go for MongoDB for sure, since Its "update in place" is pretty awesome. But then you should handle triggers manually, but using another collection to update in place a general document can be a handy solution. Again be careful about sharding.

Finally I think you cannot select a database with just concurrency, you need to plan the api (how you would retrieve data) then look at options in hand.

Solution 2

I would recommend MongoDB. My requirements wasn't nearly as high as yours but it was reasonably close. Assuming you'll be using C#, I recommend the official MongoDB C# driver and the InsertBatch method with SafeMode turned on. It will literally write data as fast as your file system can handle. A few caveats:

  1. MongoDB does not support triggers (at least the last time I checked).
  2. MongoDB initially caches data to RAM before syncing to disk. If you need real-time needs with durability, you might want to set fsync lower. This will have a significant performance hit.
  3. The C# driver is a little wonky. I don't know if it's just me but I get odd errors whenever I try to run any long running operations with it. The C++ driver is much better and actually faster than the C# driver (or any other driver for that matter).

That being said, I'd also recommend looking into RavenDB as well. It supports everything you're looking for but for the life of me, I couldn't get it to perform anywhere close to Mongo.

The only other database that came close to MongoDB was Riak. Its default Bitcask backend is ridiculously fast as long as you have enough memory to store the keyspace but as I recall it doesn't support triggers.

Solution 3

Membase (and the soon-to-be-released Couchbase Server) will easily handle your needs and provide dynamic scalability (on-the-fly add or remove nodes), replication with failover. The memcached caching layer on top will easily handle 200k ops/sec, and you can linearly scale out with multiple nodes to support getting the data persisted to disk.

We've got some recent benchmarks showing extremely low latency (which roughly equates to high throughput): http://10gigabitethernet.typepad.com/network_stack/2011/09/couchbase-goes-faster-with-openonload.html

Don't know how important it is for you to have a supported Enterprise class product with engineering and QA resources behind it, but that's available too.

Edit: Forgot to mention that there is a built-in trigger interface already, and we're extending it even further to track when data hits disk (persisted) or is replicated.

Perry

Solution 4

  • We are looking at a document db storage solution with fail over clustering, for some read/write intensive application

Riak with Google's LevelDB backend [here is an awesome benchmark from Google], given enough cache and solid disks is very fast. Depending on a structure of the document, and its size ( you mentioned 2KB ), you would need to benchmark it of course. [ Keep in mind, if you are able to shard your data ( business wise ), you do not have to maintain 40K/s throughput on a single node ]

Another advantage with LevelDB is data compression => storage. If storage is not an issue, you can disable the compression, in which case LevelDB would literally fly.

Riak with secondary indicies allows you to make you data structures as documented as you like => you index only those fields that you care about searching by.

Successful and painless Fail Over is Riak's second name. It really shines here.

  • We also need a mechanism for the db to notify about the newly written records (some kind of trigger at db level)

You can rely on pre-commit and post-commit hooks in Riak to achieve that behavior, but again, as any triggers, it comes with the price => performance / maintainability.

  • The Inserts should be readable right away - almost realtime

Riak writes to disk (no async MongoDB surprises) => reliably readable right away. In case you need a better consistency, you can configure Riak's quorum for inserts: e.g. how many nodes should come back before the insert is treated as successful

In general, if fault tolerance / concurrency / fail over / scalability are important to you, I would go with data stores that are written in Erlang, since Erlang successfully solves these problems for many years now.

Share:
18,515

Related videos on Youtube

amazedsaint
Author by

amazedsaint

ElasticObject for .NET 4.0 - A Fluent, dynamic way to work with data formats like XML TOP 5 Programming Mistakes .NET Developers must avoid What is LINQ to Events? C# 4.0 Dynamic Features - 'Attaching' properties/methods at runtime Applying Design Patterns - Thought Process with examples Fluent programming in C#

Updated on July 11, 2020

Comments

  • amazedsaint
    amazedsaint almost 4 years

    We are looking at a document db storage solution with fail over clustering, for some read/write intensive application.

    We will be having an average of 40K concurrent writes per second written to the db (with peak can go up to 70,000 during) - and may have around almost similiar number of reads happening.

    We also need a mechanism for the db to notify about the newly written records (some kind of trigger at db level).

    What will be a good option in terms of a proper choice of document db and related capacity planning?

    Updated

    More details on the expectation.

    • On an average, we are expecting 40,000 (40K) Number of inserts (new documents) per second across 3-4 databases/document collections.
    • The peak may go up to 120,000 (120K) Inserts
    • The Inserts should be readable right away - almost realtime
    • Along with this, we expect around 5000 updates or deletes per second
    • Along with this, we also expect 500-600 concurrent queries accessing data. These queries and execution plans are somewhat known, though this might have to be updated, like say, once in a week or so.
    • The system should support failover clustering on the storage side
    • Admin
      Admin about 13 years
      Some more details might be helpful. Do the writes need to be readable right away, or is it OK if there's a delay there? How big are the reads and writes? How are the reads and writes distributed across the data (like, 20,000 new documents vs. 20,000 edits to the same document)?
    • amazedsaint
      amazedsaint about 13 years
      Needs to be readable right away. The record size will be around 2K/record. 20,000 fresh inserts per second - Updates are very less compared to that. Also, please note that the peek is around 70,000
    • amazedsaint
      amazedsaint about 13 years
      Updated the baselines, please see above
    • MSTdev
      MSTdev over 7 years
      lets check this this will help you db-engines.com/en/system/MongoDB%3BRavenDB
    • Danielle
      Danielle about 5 years
      Checkout the MongoDB -vs- RavenDB whitepaper ! ravendb.net/whitepapers/… RavenDB if by far a better option
  • amazedsaint
    amazedsaint about 13 years
    +1 for bigcouch and _changes api. We are dealing with Mostly inserts.
  • frail
    frail about 13 years
    I dont know much about RavenDB so I am not gonna comment on it.
  • Steve Casey
    Steve Casey almost 12 years
    1. mongodb does not support triggers but you can quite easily create a process to read the oplog
  • Steve Casey
    Steve Casey almost 12 years
    2. fsync is used to flush pending writes to disk primarily for backups etc. you would not use it for this scenario. I would use either 'journalCommitInterval' with journaling on and a small ms setting or (slower option, maybe not suitable) db.runCommand({getlasterror:1,j:true})
  • Scott
    Scott about 11 years
    The language the db is written in has little to do with fault tolerance, concurrency, etc. It may make some of those easier to write, but at the end of the day it's the same bytes hitting the CPU instruction set. It may be harder, but I can rewrite any of these in assembly given the time.
  • tolitius
    tolitius about 11 years
    I doubt you can, but even if you could, I would go with the one you would write in Erlang. It's about your smarts vs. "Joe Armstrong's and X years of research built in"
  • Scott
    Scott about 11 years
    The 'I' is the colloquial 'one'. The point is that language implementation doesn't matter as much as architecture. It would be easy to write a terrible database in Erlang and a decent on in VB. It may be more difficult to do that, but it doesn't mean that just because it's written in a certain language it's inherently better. That's a red herring argument.
  • tolitius
    tolitius about 11 years
    since a comment is minimum 15 characters I had to use this prelude to my actual answer which is: "ok".