Multiple Databases Vs Single Database with logically partitioned data

29,338

Solution 1

You'll wish you had used separate databases:

  • If you ever want to grant permissions to the databases themselves to clients or superusers.
  • If you ever want to restore just one client's database without affecting the data of the others.
  • If there are regulatory concerns governing your data and data breaches, and you belatedly discover that these regulations can only be met by having separate databases. (Update: a little over 4 years after the writing of this answer, GDPR went into effect)
  • If you ever want to easily move your customer data to multiple database servers or otherwise scale out, or move larger/more important customers to different hardware. In a different part of the world.
  • If you ever want to easily archive and decommission old customer data.
  • If your customers care about their data being siloed, and they find out that you did otherwise.
  • If your data is subpoenaed and it's hard to extract just one customer's data, or the subpoena is overly broad and you have to produce the entire database instead of just the data for the one client.
  • When you forget to maintain vigilance and just one query slips through that didn't include AND CustomerID = @CustomerID. Hint: use a scripted permissions tool, or schemas, or wrap all tables with views that include WHERE CustomerID = SomeUserReturningFunction(), or some combination of these.
  • When you get permissions wrong at the application level and customer data is exposed to the wrong customer.
  • When you want to have different levels of backup and recovery protection for different clients.
  • Once you realize that building an infrastructure to create, provision, configure, deploy, and otherwise spin up/down new databases is worth the investment because it forces you to get good at it.
  • When you didn't allow for the possibility of some class of people needing access to multiple customers' data, and you need a layer of abstraction on top of Customer because WHERE CustomerID = @CustomerID won't cut it now.
  • When hackers target your sites or systems, and you made it easy for them to get all the data of all your customers in one fell swoop after getting admin credentials in just one database.
  • When your database backup takes 5 hours to run and then fails.
  • When you have to get the Enterprise edition of your DBMS so you can make compressed backups so that copying the backup file over the network takes less than 5 hours more.
  • When you have to restore the entire database every day to a test server which takes 5 hours, and run validation scripts that take 2 hours to complete.
  • When only a few of your customers need replication and you have to apply it to all of your customers instead of just those few.
  • When you want to take on a government customer and find out that they require you to use a separate server and database, but your ecosystem was built around a single server and database and it's just too hard or will take too long to change.

You'll be glad you used separate databases:

  • When a pilot rollout to one customer completely explodes and the other 999 customers are completely unaffected. And you can restore from backup to fix the problem.
  • When one of your database backups fails and you can fix just that one in 25 minutes instead of starting the entire 10-hour process over again.

You'll wish you had used a single database:

  • When you discover a bug that affects all 1000 clients and deploying the fix to 1000 databases is hard.
  • When you get permissions wrong at the database level and customer data is exposed to the wrong customer.
  • When you didn't allow for the possibility of some class of people needing access to a subset of all the databases (perhaps two customers merge).
  • When you didn't think how hard it would be to merge two different databases of data.
  • When you've merged two different databases of data and realize one was the wrong one, and you didn't plan for recovering from this scenario.
  • When you try to grow past 32,767 customers/databases on a single server and find out that this is the maximum in SQL Server 2012.
  • When you realize that managing 1,000+ databases is a bigger nightmare than you ever imagined.
  • When you realize that you can't onboard a new customer just by adding some data in a table, and you have to run a bunch of scary and complicated scripts to create, populate, and set permissions on a new database.
  • When you have to run 1000 database backups every day, make sure they all succeed, copy them over the network, restore them all to a test database, and run validation scripts on each single one, reporting any failures in a way that will guaranteed to be seen, and which are easily and quickly actionable. And then 150 of these fail in various places and have to be fixed one at a time.
  • When you find out you have to set up replication for 1000 databases.

Just because I listed more reasons for one doesn't mean it is better.

Some readers may get value from MSDN: Multi-Tenant Data Architecture. Or perhaps SaaS Tenancy App Design Patterns. Or even Developing Multi-tenant Applications for the Cloud, 3rd Edition

Solution 2

If you are refering your architecural as "multi tenant", Microsoft has a good article which is worth to read here. It shows some comparison between "isolated" (multiple db) and "shared" (single db). Generally, shared wins when the # of tenant (client) is big, but when the size of each tenant is big, an isolated approach is recommended.

Those consideration however can only be calculated by experienced developers though.

Still if you managed to use isolated (multiple db) architecture, you still won't get direct benefit in performance when they are still run at same instance. And if you use shared (single db) architecture, consider using int instead of guid, or sequential guid if you still need to use it.

Share:
29,338
Ahsan
Author by

Ahsan

Hi, I'm Ahsan, welcome to my profile! I'm a Developer living and working in Melbourne. Love writing code for the Microsoft Platform. Primarily focused on web development technologies, I also fiddle with Linux and other things in between.

Updated on September 04, 2021

Comments

  • Ahsan
    Ahsan over 2 years

    I am pondering over a database design issue. Any help would be highly appreciated.

    We are designing an application which has 20 tables (which may grow to about 30 maximum during new feature development)

    The technology stack

    MVC4,.NET 4.X, Entity Framework 5, SQL Server 2012, ASP.NET membership framework

    No of users

    We intend to cater to about 1000 clients who would have on average 20 users.

    The Question

    Should we design the database and the application in such a way that the tables are logically partitioned, i.e all clients use the same tables with a partition guid to separate the data.

    OR

    Go for multiple databases which could prove to be difficult during new feature launch and bug fixing. BUT could potentially allow for scaling?

    Caveats: one of the tables has a binary column which stores files (maximum 5MB per record)

    In addition to this we need to consider the Membership framework tables, which we will be extending to another custom table and logically mapping users to a partition guid.

  • ilans
    ilans over 8 years
    Is it really that matter to used int over guids? why does microsoft themselves use guids all over the place in their sharepoint and other products?
  • Fendy
    Fendy over 8 years
    @ilanS Well you can read it here: sqlskills.com/blogs/kimberly/…. Usually someone use 2 fields, 1 int/bigint as primary keys and 1 guid as nonclustered index. For row selection, they use the guid. For join/etc they use the PK. It is for security matter.
  • jcolebrand
    jcolebrand about 7 years
    Letting people have access to multiple sets of data sounds like a job for a data warehouse implementation ...
  • ErikE
    ErikE about 7 years
    @jcolebrand That sounds fantastic... unless the access has to be read/write. In that case, a more difficult challenge is presented!
  • Panagiotis Kanavos
    Panagiotis Kanavos about 5 years
    On one hand, things like contained databases, global filters in EF Core 2.2 and row-level permissions alleviate some of the pain points of each scenario. On the other hand, privacy laws and GDPR mean that you have to be extra-extra careful with shared databases even if customers aren't
  • ErikE
    ErikE about 5 years
    @PanagiotisKanavos When I made my points about regulations affecting how one stores data, GDPR didn’t exist! I was a little bit prescient...