PostgreSQL: improving pg_dump, pg_restore performance

performance postgresql backup restore

67,799

Solution 1

First check that you are getting reasonable IO performance from your disk setup. Then check that you PostgreSQL installation is appropriately tuned. In particular shared_buffers should be set correctly, maintenance_work_mem should be increased during the restore, full_page_writes should be off during the restore, wal_buffers should be increased to 16MB during the restore, checkpoint_segments should be increased to something like 16 during the restore, you shouldn't have any unreasonable logging on (like logging every statement executed), auto_vacuum should be disabled during the restore.

If you are on 8.4 also experiment with parallel restore, the --jobs option for pg_restore.

Solution 2

Improve pg dump&restore

PG_DUMP | always use format-directory and -j options

time pg_dump -j 8 -Fd -f /tmp/newout.dir fsdcm_external

PG_RESTORE | always use tuning for postgres.conf and format-directory and -j options

work_mem = 32MB
shared_buffers = 4GB
maintenance_work_mem = 2GB
full_page_writes = off
autovacuum = off
wal_buffers = -1

time pg_restore -j 8 --format=d -C -d postgres /tmp/newout.dir/

Solution 3

Two issues/ideas:

By specifying -Fc, the pg_dump output is already compressed. The compression is not maximal, so you may find some space savings by using "gzip -9", but I would wager it's not enough to warrant the extra time (and I/O) used compressing and uncompressing the -Fc version of the backup.
If you are using PostgreSQL 8.4.x you can potentially speed up the restore from a -Fc backup with the new pg_restore command-line option "-j n" where n=number of parallel connections to use for the restore. This will allow pg_restore to load more than one table's data or generate more than one index at the same time.

Solution 4

I assume you need backup, not a major upgrade of database.

For backup of large databases you should setup continuous archiving instead of pg_dump.

Set up WAL archiving.
Make your base backups for example every day by using
psql template1 -c "select pg_start_backup('`date +%F-%T``')"rsync -a --delete /var/lib/pgsql/data/ /var/backups/pgsql/base/psql template1 -c "select pg_stop_backup()"`

A restore would be as simple as restoring database and WAL logs not older than pg_start_backup time from backup location and starting Postgres. And it will be much faster.

Solution 5

zcat dumpfile.gz | pg_restore -d db_name

Removes the full write of the uncompressed data to disk, which is currently your bottleneck.

View more solutions

67,799

Author by

Joe Creighton

Updated on July 23, 2022

Comments

Joe Creighton almost 2 years
When I began, I used pg_dump with the default plain format. I was unenlightened.

Research revealed to me time and file size improvements with pg_dump -Fc | gzip -9 -c > dumpfile.gz. I was enlightened.

When it came time to create the database anew,
```
# create tablespace dbname location '/SAN/dbname';
# create database dbname tablespace dbname;
# alter database dbname set temp_tablespaces = dbname;

% gunzip dumpfile.gz              # to evaluate restore time without a piped uncompression
% pg_restore -d dbname dumpfile   # into a new, empty database defined above
```
I felt unenlightened: the restore took 12 hours to create the database that's only a fraction of what it will become:
```
# select pg_size_pretty(pg_database_size('dbname'));
47 GB
```
Because there are predictions this database will be a few terabytes, I need to look at improving performance now.

Please, enlighten me.
Joe Creighton over 14 years

We didn't look at PITR (WAL archiving) because the system is not very transaction heavy but will retain many historical records instead. However, now that I think about it, a more "incremental" backup may help matters. I shall investigate. Thanks.
Joe Creighton over 14 years

We are currently at 8.3; new reason to upgrade.
Magnus Hagander over 14 years

You can use the 8.4 version of pg_restore with an 8.3 version of the server. Just make sure you use pg_dump from 8.3.
Joe Creighton over 14 years

Bah. We are stuck at 8.3 because we use the Solaris10 package install of Postgres and, "there is no plan to integrate PG8.4 into S10 at this moment." [Ref. mail-archive.com/[email protected]/msg136829.html‌] I would have to take on the task of installing and maintaining the open-source postgres. Unsure if we can do that here... Feh.
StartupGuy over 10 years

If you have a slave connected, and the load on the master is already considerable, then you may want to just do the backup on the slave instead. Especially since the slave is read-only, I imagine that may also help to some degree. In a large cluster, it may help to have one or more slaves dedicated to staggered backup's if the backups take a long time. So that you don't miss anything, you would want these standby's connected via streaming replication so they get written to from the WAL on the master.
Asclepius over 9 years

If optimizing just the pg_dump time, with parallel dump as of v9.3, compression >0 can hurt a lot! This is because pg_dump and postmaster processes already hog the CPU enough that the addition of compression >=1 makes the overall task significantly CPU-bound instead of I/O-bound. Basically, the older assumption that the CPUs are idle without compression is invalid with parallel dump.
Juan Carlos Oropeza over 7 years

shared_buffers should be set correctly what that mean?
ramnar over 5 years

configuration parameters used here improved performance significantly
Hamed about 5 years

The link is broken
stasdeep about 4 years

Wow! This helped me so much! Thank you!
Darragh Enright about 4 years

@JuanCarlosOropeza — I came across the following document about shared_buffers that might be helpful.