Maintenance of tables: do I need to REINDEX a table after truncating and repopulating?

21,140

No, you don't generally need to reindex after TRUNCATE - and if you do, you're much better off dropping the indexes, loading data, then re-creating the indexes at the end.

It's somewhat similar to this answer about cluster - Pg automatically drops the index during TRUNCATE and then incrementally rebuilds it as you insert data, so there's no index bloat held over from before the TRUNCATE.

You may get somewhat more compact and efficient indexes if you drop the indexes, truncate, insert the data, and recreate the indexes. They'll certainly be faster to build. The difference in index performance once built is unlikely to be enough to warrant the extra effort for most applications using just b-tree indexes, but the difference in the time required to populate tables can be well worth it. If you're using GiST or (especially) GIN it's really best to drop the index and re-create at the end.

If it's convenient to do so, drop the indexes and add them back at the end, just don't worry too much if this isn't practical for you.

For a regular b-tree in my test an incrementally created composite index was 3720kb vs a one-off created index of 2208kb. Build-time was 164ms (inserts) + 347ms (index) vs 742ms (inserts+index). This difference is significant, but not enough to be a huge concern unless you're doing large scale DW. A REINDEX took a further 342ms after the inserts+index run. See

So, @TomTom is right (unsurprisingly) in that it can be worth dropping and re-creating indexes if it's convenient to do so, like if you're bulk-populating tables for OLAP work.

However, reindexing is likely to be the wrong answer since it means you do a whole bunch of expensive work to create an index you then throw away. Drop the index and re-create it instead of reindexing.

Demo session:

regress=# -- Create, populate, then create indexes:
regress=# CREATE TABLE demo (someint integer, sometext text);
CREATE TABLE
regress=# \timing on
regress=# INSERT INTO demo (someint, sometext)
SELECT x, (x%100)::text
FROM generate_series(1,100000) x;
INSERT 0 100000
Time: 164.678 ms
regress=# CREATE INDEX composite_idx ON demo(sometext, someint);
CREATE INDEX
Time: 347.958 ms
regress=# SELECT pg_size_pretty(pg_indexes_size('demo'::regclass));
 pg_size_pretty 
----------------
 2208 kB
(1 row)
regress=# -- Total time: 347.958+164.678=512.636ms, index size 2208kB

regress=# -- Now, with truncate and insert:
regress=# TRUNCATE TABLE demo;
TRUNCATE TABLE
regress=# INSERT INTO demo (someint, sometext)
SELECT x, (x%100)::text
FROM generate_series(1,100000) x;
INSERT 0 100000
Time: 742.813 ms
regress=# SELECT pg_size_pretty(pg_indexes_size('demo'::regclass));
 pg_size_pretty 
----------------
 3720 kB
(1 row)
regress=# -- Total time 742ms, index size 3720kB
regress=# -- Difference: about 44% time increase, about 68% index size increase.
regress=# -- Big-ish, but whether you care depends on your application. Now:

regress=# REINDEX INDEX composite_idx ;
REINDEX
Time: 342.283 ms
regress=# SELECT pg_size_pretty(pg_indexes_size('demo'::regclass));
 pg_size_pretty 
----------------
 2208 kB
(1 row)

regress=# -- Index is back to same size, but total time for insert with progressive
regress=# -- index build plus reindex at the end us up to 1084.283, twice as long as
regress=# -- dropping the indexes, inserting the data, and re-creating the indexes took.

So:

  • For OLAP, drop indexes, insert, re-create indexes.

  • For OLTP you'll probably just want to stick with progressive index builds. Consider a non-100% fillfactor on indexes to reduce insert costs.

  • Avoid inserting with progressive index builds then re-indexing, it's the worst of both worlds.

Of course, the sizes used in this test are toy table sizes, so you should repeat this testing on a sample of your real world data and indexes to get a solid idea of how much difference it makes for you. I repeated these tests with a scale factor 100 greater than the above and consistently found the index was almost exactly twice the size if build incrementally, though the relative build time difference actually fell for this particular test.

So: Test with your data and schema.

Share:
21,140

Related videos on Youtube

JamesF
Author by

JamesF

Updated on September 18, 2022

Comments

  • JamesF
    JamesF over 1 year

    I have a table with about 2 million rows in it of transactional data that we use for analytics. Every week we reload this with new data, so we've been using TRUNCATE to clear it out and then inserting new rows.

    There are a couple of indexes on the table. If I don't drop and recreate the indexes, will I need to reindex after every truncation and repopulation, or is that unnecessary? Should I be running VACUUM after TRUNCATE, or is that also unnecessary?

  • TomTom
    TomTom about 11 years
    Actually it may be a lot of difference, depending how the data is loaded. Dropping and recreating not-needed (for the operation) indices is standard practice in data warehouse ETL and even update scenarios when a lot of data changes. Depends a lot whether you load in one SQL statement / batch or in processing.
  • Craig Ringer
    Craig Ringer about 11 years
    @TomTom I should've been more specific - it's less bad for b-tree indexes, and a truly huge difference for some GiST or GIN indexes.
  • JamesF
    JamesF about 11 years
    Thanks. @TomTom , I assume you mean dropping and recreating indexes is going to be more efficient if you're doing loads in batches - or is my inference incorrect?
  • TomTom
    TomTom about 11 years
    Yes. Because in a DWH you have a lot of indices for reading only (analysis) and it is more efficient sometimes to drop them when you reload the tables and just recreate them at the end.
  • Craig Ringer
    Craig Ringer about 11 years
    @JamesF There are two kinds of efficiency to be considered here: Efficiency of loading, and efficiency of the resulting index. Answer extended with examples.
  • Craig Ringer
    Craig Ringer about 11 years
    @TomTom Updated answer should be clearer. Missed the OLAP focus 1st time around, and I failed to distinguish clearly enough between REINDEXing vs dropping indexes and re-creating them.
  • JamesF
    JamesF about 11 years
    Thanks to you both, that's made things a lot clearer to me. Apologies for not responding earlier, I've been travelling a lot...