How to improve MongoDB insert performance

19,672

Solution 1

You can try to modify the Write concern levels. Obviously there is a risk on this, as you wouldn't be able to catch any writing error, but at least you should still be able to capture network errors. As MongoDB groups the bulk insert operations in groups of 1000, this should speed up the process.

W by default is 1:

enter image description here

When you change it to 0:

enter image description here

If you are not concern about the order of elements, you can gain some speed calling the unordered bulk operation

await m_Collection.BulkWriteAsync(updates, new BulkWriteOptions() { IsOrdered = false });

With an unordered operations list, MongoDB can execute in parallel the write operations in the list and in any order. Link

Solution 2

"There is not a substantial read rate on the database so Sharding would not improve matters, although perhaps I am wrong."

An update involves a read. aka finding that forsaken _id -- so perhaps sharding might be helpful if not v helpful

Solution 3

Marked answer here is good. I want to add an additional code to help others who use InsertMany instead of BulkWriteAsync to take benefit of IsOrdered = false quicker

    m_Collection.InsertMany(listOfDocument, new InsertManyOptions() { IsOrdered = false });
Share:
19,672

Related videos on Youtube

James
Author by

James

Updated on September 16, 2022

Comments

  • James
    James over 1 year

    The result:

    If you are operating on a dataset that is fault tolerant, or doing a one time process you can verify, changing WriteAcknowledge to Unacknowledged can help.

    Also, bulk operations are IsOrdered by default, which I was not aware off. Setting this to False actually makes the operation perform in bulk, otherwise it operates as one thread of updates.


    MongoDB 3.0 / WiredTiger / C# Driver

    I have a collection with 147,000,000 documents, of which I am performing updates each second (hopefully) of approx. 3000 documents.

    Here is an example update:

    "query" : {
        "_id" : BinData(0,"UKnZwG54kOpT4q9CVWbf4zvdU223lrE5w/uIzXZcObQiAAAA")
    },
    "updateobj" : {
        "$set" : {
            "b" : BinData(0,"D8u1Sk/fDES4IkipZzme7j2qJ4oWjlT3hvLiAilcIhU="),
            "s" : true
        }
    }
    

    This is a typical update of which I my requirements are to be inserted at a rate of 3000 per second.

    Unfortunately these are taking twice as long, for instance the last update was for 1723 documents, and took 1061ms.

    The collection only has an index on the _id, no other indexes, and the average document size for the collection is 244 bytes, uncapped.

    The server has 64GB of memory, 12 threads. Insert performance is excellent with lower collection sizes, say around 50 million, but after about 80 million really starts to drop off.

    Could it be because the entire set does not sit in memory? Database is backed by RAID0 SSDs so IO performance should not become a bottleneck and if it was it should have shown this at the beginning?

    Would appreciate some guidance as I'm confident MongoDB can fulfill my rather meager requirements compared to some applications it is used in. There is not a substantial read rate on the database so Sharding would not improve matters, although perhaps I am wrong.

    Either way, the current insert rate is not good enough.

    Update: Here is the explain() of just the query...

    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "Collection",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "_id" : {
                "$eq" : { "$binary" : "SxHHwTMEaOmSc9dD4ng/7ILty0Zu0qX38V81osVqWkAAAAAA", "$type" : "00" }
            }
        },
        "winningPlan" : {
            "stage" : "IDHACK"
        },
        "rejectedPlans" : []
    },
    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 1,
        "executionTimeMillis" : 1,
        "totalKeysExamined" : 1,
        "totalDocsExamined" : 1,
        "executionStages" : {
            "stage" : "IDHACK",
            "nReturned" : 1,
            "executionTimeMillisEstimate" : 0,
            "works" : 2,
            "advanced" : 1,
            "needTime" : 0,
            "needFetch" : 0,
            "saveState" : 0,
            "restoreState" : 0,
            "isEOF" : 1,
            "invalidates" : 0,
            "keysExamined" : 1,
            "docsExamined" : 1
        },
        "allPlansExecution" : []
    },
    

    The query it self is very fast, and the update operation takes about 25ish milliseconds, they are being pushed to Mongo by use of the BulkWriter: await m_Collection.BulkWriteAsync(updates);

  • James
    James almost 9 years
    Thank you for your input, you have highlighted two items I thought were set by default! I had looked at WriteConcern and read the documentation to default to the second type. I will of course make this change, I am happy with the data being in memory as it is fault tolerant. As to IsOrdered, I assumed that was always false and had to be set to True? Big revelation as I want my inserts in parallel, thats why I'm using BulkWrite!! I will report back with my findings, thank you so much.
  • James
    James almost 9 years
    Thanks for your input, I've seen Mongo instances handle many hundreds of millions of keys with no issues so I am sure it is me doing something wrong. Mongo has performed very will with this workload in the past, now it is scaling up I am encountering issues I'm sure I will find solutions for.
  • James
    James almost 9 years
    Looking at the BulkWriteOptions in ILSpy, you are correct, the IsOrdered flag is true by default. This is very non-intuitive but very helpful to know.
  • D.Rosado
    D.Rosado almost 9 years
    Please keep us updated on the increase of inserts-per-second editing the question.
  • James
    James almost 9 years
    Unfortunately this did not improve performance, WriteConcern was set to Unacknowledged, and the Updates were set to IsOrdered = false. I'm starting to think maybe the nested documents are slowing down the process, but the inserts would slow down a lot, due to having approx. 30 nested documents in each parent document, with a hash as an Index this would kill the indexer inserting 30 times per one 'logical' document.
  • James
    James almost 9 years
    Apologies, this is the version of the test where there is no nested documents, which is faster than nested. But still no perf improvements on either version.
  • James
    James almost 9 years
    My testing did not bring up the correct results. Rerunning the import from scratch with the WriteConcern Unacknowledged and IsOrdered set to false was able to maintain an approx. 35k per sec insert rate along with an approx. 11k update per sec rate. Superb result, thank you.