Is it better to use multiple databases with one schema each, or one database with multiple schemas?

110,756

Solution 1

A PostgreSQL "schema" is roughly the same as a MySQL "database". Having many databases on a PostgreSQL installation can get problematic; having many schemas will work with no trouble. So you definitely want to go with one database and multiple schemas within that database.

Solution 2

Definitely, I'll go for the one-db-many-schemas approach. This allows me to dump all the database, but restore just one very easily, in many ways:

  1. Dump the db (all the schema), load the dump in a new db, dump just the schema I need, and restore back in the main db.
  2. Dump the schema separately, one by one (but I think the machine will suffer more this way - and I'm expecting like 500 schemas!)

Otherwise, googling around I've seen that there is no auto-procedure to duplicate a schema (using one as a template), but many suggest this way:

  1. Create a template-schema
  2. When need to duplicate, rename it with new name
  3. Dump it
  4. Rename it back
  5. Restore the dump
  6. The magic is done.

I've written two rows in Python to do that; I hope they can help someone (in-2-seconds-written-code, don’t use it in production):

import os
import sys
import pg

# Take the new schema name from the second cmd arguments (the first is the filename)
newSchema = sys.argv[1]

# Temperary folder for the dumps
dumpFile = '/test/dumps/' + str(newSchema) + '.sql'

# Settings
db_name = 'db_name'
db_user = 'db_user'
db_pass = 'db_pass'
schema_as_template = 'schema_name'

# Connection
pgConnect = pg.connect(dbname= db_name, host='localhost', user= db_user, passwd= db_pass)

# Rename schema with the new name
pgConnect.query("ALTER SCHEMA " + schema_as_template + " RENAME TO " + str(newSchema))

# Dump it
command = 'export PGPASSWORD="' + db_pass + '" && pg_dump -U ' + db_user + ' -n ' + str(newSchema) + ' ' + db_name + ' > ' + dumpFile
os.system(command)

# Rename back with its default name
pgConnect.query("ALTER SCHEMA " + str(newSchema) + " RENAME TO " + schema_as_template)

# Restore the previous dump to create the new schema
restore = 'export PGPASSWORD="' + db_pass + '" && psql -U ' + db_user + ' -d ' + db_name + ' < ' + dumpFile
os.system(restore)

# Want to delete the dump file?
os.remove(dumpFile)

# Close connection
pgConnect.close()

Solution 3

I would recommend against accepted answer - multiple databases instead of multiple schemas for this set of reasons:

  1. If you are running microservices, you want to enforce the inability to join between your "schemas", so the data is not entangled and developers won't end up joining other microservice's schema and wonder why when other team makes a change their stuff no longer works.
  2. You can later migrate to a separate database machine if your load requires with ease.
  3. If you need to have a high-availability and/or replication set up, it's better to have separate databases completely independent of each other. You cannot replicate one schema only compared to the whole database.

Solution 4

I would say, go with multiple databases AND multiple schemas :)

Schemas in PostgreSQL are a lot like packages in Oracle, in case you are familiar with those. Databases are meant to differentiate between entire sets of data, while schemas are more like data entities.

For instance, you could have one database for an entire application with the schemas "UserManagement", "LongTermStorage" and so on. "UserManagement" would then contain the "User" table, as well as all stored procedures, triggers, sequences, etc. that are needed for the user management.

Databases are entire programs, schemas are components.

Solution 5

In a PostgreSQL context I recommend to use one db with multiple schemas, as you can (e.g.) UNION ALL across schemas, but not across databases. For that reason, a database is really completely insulated from another database while schemas are not insulated from other schemas within the same database.

If you -for some reason- have to consolidate data across schemas in the future, it will be easy to do this over multiple schemas. With multiple databases you would need multiple db-connections and collect and merge the data from each database "manually" by application logic.

The latter have advantages in some cases, but for the major part I think the one-database-multiple-schemas approach is more useful.

Share:
110,756

Related videos on Youtube

Strae
Author by

Strae

I can accept failure, everyone fails at something - But I can't accept not trying. -- You HAVE to assume your visitor is a maniac serial killer, out to destroy your application. And you have to prevent it. Hire me

Updated on March 26, 2021

Comments

  • Strae
    Strae over 3 years

    After this comment to one of my questions, I'm thinking if it is better using one database with X schemas or vice versa.

    I'm developing a web application where, when people register, I create (actually) a database (no, it's not a social network: everyone must have access to his own data and never see the data of the other user). That's the way I used for the previous version of my application (that is still running on MySQL): through the Plesk API, for every registration, I do:

    1. Create a database user with limited privileges;
    2. Create a database that can be accessed just by the previous created user and the superuser (for maintenance)
    3. Populate the database

    Now, I'll need to do the same with PostgreSQL (the project is getting mature and MySQL don't fulfil all the needs). I need to have all the databases/schemas backups independent: pg_dump works perfectly in both ways, and the same for the users that can be configured to access just one schema or one database.

    So, assuming you are more experienced PostgreSQL users than me, what do you think is the best solution for my situation, and why? Will there be performance differences using $x database instead of $x schemas? And what solution will be better to maintain in the future (reliability)? All of my databases/schemas will always have the same structure!

    For the backups issue (using pg_dump), is maybe better using one database and many schemas, dumping all the schemas at once: recovering will be quite simple loading the main dump in a development machine and then dump and restore just the schema needed: there is one additional step, but dumping all the schema seem faster than dumping them one by one.

    UPDATE 2012

    Well, the application structure and design changed so much during those last two years. I'm still using the "one db with many schemas" -approach, but still, I have one database for each version of my application:

    Db myapp_01
        \_ my_customer_foo_schema
        \_ my_customer_bar_schema
    Db myapp_02
        \_ my_customer_foo_schema
        \_ my_customer_bar_schema
    

    For backups, I'm dumping each database regularly, and then moving the backups on the development server. I'm also using the PITR/WAL backup but, as I said before, it's not likely I'll have to restore all database at once. So it will probably be dismissed this year (in my situation is not the best approach).

    The one-db-many-schema approach worked very well for me since now, even if the application structure is totally changed. I almost forgot: all of my databases/schemas will always have the same structure! Now, every schema has its own structure that change dynamically reacting to users data flow.

    • Osama Al-Maadeed
      Osama Al-Maadeed almost 15 years
      "all of my databases/schemas will ever have the same structure!" do you mean they all have the same structure? Or never?
    • Strae
      Strae almost 15 years
      Sorry, yes, they all have the same structure forever: if i change one, i'll change all of them ;)
    • Joshua Partogi
      Joshua Partogi about 14 years
      If you have 1000 customer, that means you have to update 1000 schema?
    • Strae
      Strae about 14 years
      @jpartogi: yes, but i have to update just the tables structure, not the data.
    • Kapil
      Kapil over 12 years
      So, what did you go in for finally? One question, though, although performance of queries, etc. can be controlled by tablespaces, schemas resulting into equivalent performance of multi-db vs multi-schema, any impact on WAL logs???
    • Strae
      Strae over 12 years
      @Kapil: well, the design of the application has been radical changed during the time... let me update my question with few details
    • deepg
      deepg about 5 years
      How do you ensure the security between two schemas? One DB connection can update multiple Schemas - this is a boon and bare as well
    • Vahid Alvandi
      Vahid Alvandi about 3 years
      i have same issue for my SAAS application ! i have one mysql db for each customer , but i think in future how i can change schema for all user when each user have database , please help me?
  • Strae
    Strae almost 15 years
    ... and so i'll have 1 database, with inside the schemas: $customer1_user_schema, $customer2_user_schema, $customer3_user_schema, $customer1_documents_schema, $customer2_documents_schema, $customer3_documents_schema? Mh... dont seem a reliable way... and what about performance? And what about the code of my application (will be php and python)? so many schemas..
  • matt b
    matt b almost 15 years
    This. Postgres doesn't allow you to query across databases, which can be pretty annoying.
  • akaihola
    akaihola over 14 years
    "Having many databases on a PostgreSQL installation can get problematic" -- please clarify; is it problematic generally or in this specific case, and why?
  • Neil McGuigan
    Neil McGuigan over 10 years
    "The most common use case for using multiple schemas in a database is building a software-as-a-service application wherein each customer has their own schema. While this technique seems compelling, we strongly recommend against it as it has caused numerous cases of operational problems. For instance, even a moderate number of schemas (> 50) can severely impact the performance of Heroku’s database snapshots tool" devcenter.heroku.com/articles/heroku-postgresql
  • carbocation
    carbocation over 9 years
    @NeilMcGuigan: Interestingly, that seems to be the opposite conclusion from kquinn's (accepted) answer.
  • Kamil Gosciminski
    Kamil Gosciminski over 8 years
    For those reading it in the end of 2015. There is a dblink Postgres extension for querying across databases now (that's a reply to @mattb comment).
  • Thalis K.
    Thalis K. over 8 years
    @mattb For those reading it after 2014, Pg has foreign data wrappers starting at v9.3, and in particular the postgres_fdw allows querying across Pg databases (IMO better than dblink).
  • frankhommers
    frankhommers over 7 years
    @Strae: I'm reading this as: each customer has it's database customer1_database, customer2_database and within those databases you have user_schema, documents_schema.
  • Machisuji
    Machisuji over 7 years
    Having one database with many schemas will make it virtually impossible to dump a single schema of those, though. I'm running a single postgres database with more than 3000 schemas and pg_dump just fails with an out of memory error if you try to dump a single schema. I wonder if this would be any different had I 3000 databases instead.
  • mirabilos
    mirabilos over 5 years
    Why can it get problematic? I can see how multiple PostgreSQL clusters on one server can get problematic, but multiple databases (with one schema each, usually just putting stuff into public) in one cluster should be fine.
  • Paul
    Paul over 5 years
    Digging into this, this is a rather interesting article about the matter influitive.io/… and also rob.conery.io/2014/05/28/a-better-id-generator-for-postgresq‌​l addresses some of the issues you may run into. The first article also has an comment which correlates to many-schema's-many-tables isues from Josh Berkus (medium.com/@jberkus/…)
  • Strae
    Strae over 3 years
    Totally depends on the service. Please note this is a pretty old question; But the service ended up with the needing to make query between two "microservices" (that wasnt in the initial project). Using the schema made it kinda easy, if I dont remember wrong has been just a matter of configure better the database users's permissions. If we would have chosen the "N databases" way, that would have been a bit harder (but definitely possible)
  • Strae
    Strae over 3 years
    Nowadays the approach would be different thougt, probably exposing some kind of API mantaining the database/schema totally separated.
  • Alan Sereb
    Alan Sereb over 3 years
    @Strae, you are right, it's an old question, however, I just want to bring it back up and was hoping to get some insight into the same question. I did some research and decided to put in my 10 cents.
  • Strae
    Strae over 3 years
    Yep and youre welcome to do so! After my experience is that (well for my situation) the difference wasnt much; using 1 db with multiple schemas helped with backups and cross-schema queryes
  • Ronnie
    Ronnie about 2 years
    My favourite answer. We shouldn't assume that allowing cross-schema queries is a good thing, in fact we should begin with the opposite assumption!