Postgresql one db with multiple schemas vs multiple db with one schema

13,535

Solution 1

If performance is important, there is no substitute for benchmarking your specific use case. If it isn't that important, then go with easier to administer! Hardware is cheap compared to programmer/DBA time, and compared to the expected higher accident rate on a more complex setup. Not to mention computers follow Moore's law, but the day stubbornly refuses to get longer.

If I had to guess, having not benchmarked your specific use case (since I can't), I'd guess one database with multiple schemas will be better performing because:

  1. Each connection is to one database, AFAIK. Connection setup/teardown is expensive in PostgreSQL.

  2. Many schemas is closer to many tables than many databases is, and I'd expect many tables to be optimized for more than I'd expect many databases to be.

However, I can see a counterexample which may apply. Each database is stored in a single directory. This makes splitting databases across filesystems—and thus disk arrays—really easy using normal filesystem tools (e.g., mount points and/or symbolic links). Splitting databases across multiple arrays is very unlikely to outperform one, larger RAID10 array with the same number of disks, but will provide better isolation (database A doing a huge query will not affect database B as much). Do, however, check your OS's IO scheduler documentation; remember that each PostgreSQL connection gets its own backend process, so per-process fairness queuing may accomplish this better.

Note that you can also segment the data across filesystems using PostgreSQL's CREATE TABLESPACE and friends, so the above can actually be done with schemas as well.

Solution 2

Performance-wise, it's going to depend completely on your application.

For example, multiple databases requires connection-pooling per database. If you have hundreds or thousands of databases, that pretty much means you can't do connection pooling. That'll cost you performance for anything except say a client application with a single persistent connection to the database.

However, if you only access "one database at a time" (and not within seconds of each other either), keeping things in separate databases will only need to load the system table cache for the databases that are actively being used, leaving more memory to cache user tables (since the system tables in each database will be significantly smaller).

In most cases, schemas in one database will win out. A fairly common solution is a hybrid, of X databases and Y schemas.

Share:
13,535

Related videos on Youtube

snahor
Author by

snahor

Updated on April 15, 2022

Comments

  • snahor
    snahor about 2 years

    I've been reading this question, but it doesn't help me. Considering db administration, I think one db with multiple schemas is easier to maintain than the other option, but in terms of performance, which one is better?, is there any advantage from one over the other?

    TIA.

  • Jeff Davis
    Jeff Davis almost 15 years
    "CREATE TABLESPACE and friends, so the above can actually be done with schemas as well." Yeah, I don't see any performance advantage to using many databases. Incidentally, I don't understand how RAID has anything to do with this question. You can put whatever you want wherever you want with tablespaces regardless of the DB vs. schema question.
  • derobert
    derobert almost 15 years
    @Jeff Davis: Yes, you can, its just a little easier to do with standard Unix tools — not SQL commands — with multiple databases. The point about RAID was that often you get better performance than carefully splitting across disks by just adding all those disks to a RAID10.