PostgreSQL temporary tables

90,163

Solution 1

Please note that, in Postgres, the default behaviour for temporary tables is that they are not automatically dropped, and data is persisted on commit. See ON COMMIT.

Temporary table are, however, dropped at the end of a database session:

Temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction.

There are multiple considerations you have to take into account:

  • If you do want to explicitly DROP a temporary table at the end of a transaction, create it with the CREATE TEMPORARY TABLE ... ON COMMIT DROP syntax.
  • In the presence of connection pooling, a database session may span multiple client sessions; to avoid clashes in CREATE, you should drop your temporary tables -- either prior to returning a connection to the pool (e.g. by doing everything inside a transaction and using the ON COMMIT DROP creation syntax), or on an as-needed basis (by preceding any CREATE TEMPORARY TABLE statement with a corresponding DROP TABLE IF EXISTS, which has the advantage of also working outside transactions e.g. if the connection is used in auto-commit mode.)
  • While the temporary table is in use, how much of it will fit in memory before overflowing on to disk? See the temp_buffers option in postgresql.conf
  • Anything else I should worry about when working often with temp tables? A vacuum is recommended after you have DROPped temporary tables, to clean up any dead tuples from the catalog. Postgres will automatically vacuum every 3 minutes or so for you when using the default settings (auto_vacuum).

Also, unrelated to your question (but possibly related to your project): keep in mind that, if you have to run queries against a temp table after you have populated it, then it is a good idea to create appropriate indices and issue an ANALYZE on the temp table in question after you're done inserting into it. By default, the cost based optimizer will assume that a newly created the temp table has ~1000 rows and this may result in poor performance should the temp table actually contain millions of rows.

Solution 2

Temporary tables provide only one guarantee - they are dropped at the end of the session. For a small table you'll probably have most of your data in the backing store. For a large table I guarantee that data will be flushed to disk periodically as the database engine needs more working space for other requests.

EDIT: If you're absolutely in need of RAM-only temporary tables you can create a table space for your database on a RAM disk (/dev/shm works). This reduces the amount of disk IO, but beware that it is currently not possible to do this without a physical disk write; the DB engine will flush the table list to stable storage when you create the temporary table.

Share:
90,163
Nicholas Leonard
Author by

Nicholas Leonard

Updated on November 18, 2020

Comments

  • Nicholas Leonard
    Nicholas Leonard over 3 years

    I need to perform a query 2.5 million times. This query generates some rows which I need to AVG(column) and then use this AVG to filter the table from all values below average. I then need to INSERT these filtered results into a table.

    The only way to do such a thing with reasonable efficiency, seems to be by creating a TEMPORARY TABLE for each query-postmaster python-thread. I am just hoping these TEMPORARY TABLEs will not be persisted to hard drive (at all) and will remain in memory (RAM), unless they are out of working memory, of course.

    I would like to know if a TEMPORARY TABLE will incur disk writes (which would interfere with the INSERTS, i.e. slow to whole process down)

  • Nicholas Leonard
    Nicholas Leonard about 15 years
    Good stuff. Thx. I actually only used a temp table since I needed to execute two different SELECTs on it (so an Analyse would not be worth it, I fancy). I provided the operations with lots of temp_buffers, yet since TEMP tables were being created and dropped by many python threads, ...
  • Nicholas Leonard
    Nicholas Leonard about 15 years
    postgres was eating up more and more RAM as the script did its job. I found that limiting the amount of python threads (running on a client computer) to a little more than the amount of cpu-cores, gave the best (most efficient and effective) execution times. Thx again for you wisdom Vlad.
  • vladr
    vladr about 15 years
    Even if you only SELECT on the temp table twice, investing a few milliseconds in an index creation + ANALYZE each time you create the temp table could save you tons when/if joining other tables with the temp table - put the queries manually in PgAdminIII and use the "Query/Explain(F7)" function.
  • Nicholas Leonard
    Nicholas Leonard about 15 years
    Really? Ok, I guess I needed to have someone tell me to try it since it seems counter intuitive (setup costs do not seem to be worth it). Anyway, I thank you and I will try to analyse the ANALYSE next time. I am already seeing the value of TEMP INDEXs thought. Yet I wonder if an ANALYSE is really...
  • Nicholas Leonard
    Nicholas Leonard about 15 years
    worth it when the query optimizer has been configured in such a way to "strongly encourage it" to use the INDEX? Thx again Vlad.
  • vladr
    vladr about 15 years
    The ANALYZE overhead is on average 100ms, and you can configure it per-table/column. You absolutely need an ANALYZE in order for the optimizer not to make any stupid assumptions assuming that a million-row table only contains 100 rows and table-scanning it 10 times... :)
  • vladr
    vladr about 15 years
    In other words, without doing ANALYZE you cannot say you have encouraged the optimizer in one way or another, as the optimizer unconditionally uses the ANALYZE data to make decisions.
  • Nicholas Leonard
    Nicholas Leonard about 15 years
    what about the postgres.conf variables like sequetialscan = false?
  • vladr
    vladr about 15 years
    only use enable_seqscan to debug the CBO. trying to convince postgres to use an index by setting this is like fixing a tv with a sledgehammer - or a headache with a lobotomy. Plus there's more than "to use an index or not to use an index" in the life of a database...
  • vladr
    vladr about 15 years
    ...such as "use a nested loop with index lookup" vs. "use a hash aggregate" (and a hash aggregate might be much faster than index lookup!), questions which can only be answered by analyze statistics, not by some "don't do tablescans" configuration variable
  • Erwin Brandstetter
    Erwin Brandstetter about 12 years
    I corrected a fundamental error in the answer. Temporary are (and were) never persisted at the end of a session.
  • vladr
    vladr almost 12 years
    @ErwinBrandstetter , thanks for the update, however it needs additional warnings regarding connection pooling (i.e. what a "session" means to the back-end vs. what it means to the end-user in the presence of connection pooling.) The original (misworded) answer was actually connection-pooling-safe; the updated one (although technically correct) may confer a false sense of security that can eventually backfire in connection-pooled environments.
  • shusson
    shusson over 4 years
    temporary tables are also not WAL-logged rhaas.blogspot.com/2010/05/…