Handling migrations with MongoDb

.net mongodb migration database-schema

11,286

Solution 1

There are basically two approaches:

Make sure that your application code can handle both "versions" of the data structure, and when saving, updates to the new structure
Write a migration script

I would probably go for option 1 as it's the method that allows you to gradually update, where as with option 2 you basically need to take down your application so that you can update the code (fast) and data (possibly slower) in one go.

Then later, or if you find it necessary do option 2 as well to migrate your data over. This then doesn't have to take down your site, and can happily run asynchronously in the background.

Solution 2

It seems like currently I am wanting to go with the migration option rather than a phasing out approach, so with this in mind can anyone recommend any tools for helping in this area

For those who are still looking for the solution, take a look at MongoMigrations, this tool exposes MongoDatabase (from the mongo csharp driver) for manipulations over database so you can use all features from the driver.

Solution 3

Strategies can be different. And they are depend on particular application. For sure for the sites like Facebook you will go with option #1 proposed by Derick to not hit your users at all, but if you have site that 'sells pizza' you for sure don't want make an effort to support both versions (current and new one), write more complex code, etc..

For such kind of apps simple patching may be better option:

Build server send application to 'Read mode', so anyone can read, but can't insert anything into database.
While prod in read mode i am taking database and apply patch.
Once patching done it make backup of database, stop web server, deploy new database and new application.

Sending application to read mode allow to decrease downtime, but again for sites that's 'sells pizza' you don't need read mode.

11,286

Grofit

Updated on June 03, 2022

Comments

Grofit almost 2 years
Just to give a little more context to the question, I have a web application (asp mvc) which basically wraps CRUD operations to a MongoDb instance, it carries out validation and certain business logic before the model is verified and sent over to be stored, retrieved etc.

Now one problem we have his is that in the new version the models have changed but the existing data has not, here is an example: (it is c# specific but the question really is language agnostic)
```
public class Person
{
    public Guid Id {get; set;}
    public string Name {get; set;}
    public int Age {get;set;}
    public string BadgeNo {get;set;}
}

public class Person
{
    public Guid Id {get; set;}
    public string Name {get; set;}
    public int Age {get;set;}
    public string EmployeeNo {get; set;} // Still contains same data as BadgeNo just called something different
}
```
As you can see the structure of the objects have changed but in Mongo land it is still passing out a BadgeNo, not an EmployeeNo. In an SQL land we would usually have a migration script which is run as part of the build script which would change the schema and update/insert/delete any additional data for that delta.

So how is it best to manage these sort of migrations with Mongo? Should I also have a script which I use to update all instances within Mongo? or is there some other preferred practise for doing this sort of thing.

Any advice on the subject would be great

=== Edit ===

It seems like currently I am wanting to go with the migration option rather than a phasing out approach, so with this in mind can anyone recommend any tools for helping in this area, as otherwise each migration (assuming a roll-in, roll-out) would have to be a pre compiled assembly of some kind with all the logic in. I was thinking something along the lines of FluentMigrator but instead of working with SQL you are working with Mongo. Currently my build scripts are using Nant, I have seen some ruby tools but not sure if there are any .net equivalent.
Sean Reilly about 12 years

I would choose option 1 — and then use that code to create a migration utility (a command line app, for example) that performs the equivalent of option 2 later by loading and saving all documents in a collection that are still in the old version.
Grofit about 12 years

Problem is then you are writing ALOT of excess code that you will have to maintain to just support multiple versions, and imagine 20 versions down the line you have 20 files for each model that has changed. For me the 2nd options seems better as its easier to move between versions (i.e rollbacks), I have no problem with small amounts of downtime. Thanks for the answer will leave this open a little longer as I was expecting lots of people to say Option 2, but was hoping for a little more info on how is best to automate the 2nd approach, i.e within build script.
Derick about 12 years

Grofit, that's why a combination is probably better. You can handle two versions for a while, then migrate your data a bit later, and when you go for a next version of your data structure, drop the first version (and keep the 2nd and 3rd version working)
Grofit about 12 years

I see where you are coming from but you make an assumption that all data would be up to date by a certain period. Imagine if a person uses the system and is saved on version 1.0, then doesn't touch the system again until version 1.8 (lets say 2 years later), assuming 8 released model changes have occurred there, the system would fall over when trying to use the model from 1.0 in a 1.8 environment, as it is only working off 1.7 -> 1.8. It feels like the system is kind of running on faith of constant use, or having overhead to every request to check if it needs updating.
Derick about 12 years

Grofit, rights, that's why you also do the migration script.