How do I know if the statistics of a Postgres table are up to date?

39,030

Solution 1

Check the system catalogs.

test=# SELECT schemaname, relname, last_analyze FROM pg_stat_all_tables WHERE relname = 'city';
 schemaname | relname |         last_analyze          
------------+---------+-------------------------------
 pagila     | city    | 2011-07-26 19:30:59.357898-07
 world      | city    | 2011-07-26 19:30:53.119366-07
(2 rows)

All kinds of useful information in there:

test=# \d pg_stat_all_tables           View "pg_catalog.pg_stat_all_tables"
      Column       |           Type           | Modifiers 
-------------------+--------------------------+-----------
 relid             | oid                      | 
 schemaname        | name                     | 
 relname           | name                     | 
 seq_scan          | bigint                   | 
 seq_tup_read      | bigint                   | 
 idx_scan          | bigint                   | 
 idx_tup_fetch     | bigint                   | 
 n_tup_ins         | bigint                   | 
 n_tup_upd         | bigint                   | 
 n_tup_del         | bigint                   | 
 n_tup_hot_upd     | bigint                   | 
 n_live_tup        | bigint                   | 
 n_dead_tup        | bigint                   | 
 last_vacuum       | timestamp with time zone | 
 last_autovacuum   | timestamp with time zone | 
 last_analyze      | timestamp with time zone | 
 last_autoanalyze  | timestamp with time zone | 
 vacuum_count      | bigint                   | 
 autovacuum_count  | bigint                   | 
 analyze_count     | bigint                   | 
 autoanalyze_count | bigint                   |

Solution 2

You should not have to worry about vac'ing in your application. Instead, you should have the autovac process configured on your server (in postgresql.conf), and the server takes takes of VACCUM and ANALYZE processes based on its own internal statistics. You can configure how often it should run, and what the threshold variables are for it to process.

Share:
39,030

Related videos on Youtube

Beibei
Author by

Beibei

Updated on July 21, 2020

Comments

  • Beibei
    Beibei almost 4 years

    In pgAdmin, whenever a table's statistics are out-of-date, it prompts:

    Running VACUUM recommended

    The estimated rowcount on the table schema.table deviates significantly from the actual rowcount. You should run VACUUM ANALYZE on this table.

    I've tested it using pgAdmin 3 and Postgres 8.4.4, with autovacuum=off. The prompt shows up immediately whenever I click a table that has been changed.

    Let's say I'm making a web-based system in Java, how do I detect if a table is out-of-date, so that I can show a prompt like the one in pgAdmin?

    Because of the nature of my application, here are a few rules I have to follow:

    1. I want to know if the statistics of a certain table in pg_stats and pg_statistic are up to date.

    2. I can't set the autovacuum flag in postgresql.conf. (In other words, the autovacuum flag can be on or off. I have no control over it. I need to tell if the stats are up-to-date whether the autovacuum flag is on or off.)

    3. I can't run vacuum/analyze every time to make it up-to-date.

    4. When a user selects a table, I need to show the prompt that the table is outdated when there are any updates to this table (such as drop, insert, and update) that are not reflected in pg_stats and pg_statistic.

    It seems that it's not feasible by analyzing timestamps in pg_catalog.pg_stat_all_tables. Of course, if a table hasn't been analyzed before, I can check if it has a timestamp in last_analyze to find out whether the table is up-to-date. Using this method, however, I can't detect if the table is up-to-date when there's already a timestamp. In other words, no matter how many rows I add to the table, its last_analyze timestamp in pg_stat_all_tables is always for the first analyze (assuming the autovacuum flag is off). Therefore, I can only show the "Running VACUUM recommended" prompt for the first time.

    It's also not feasible by comparing the last_analyze timestamp to the current timestamp. There might not be any updates to the table for days. And there might be tons of updates in one hour.

    Given this scenario, how can I always tell if the statistics of a table are up-to-date?

  • Beibei
    Beibei almost 13 years
    Thank you for the answer, Sean. I did try out pg_stat_all_tables. I was able to tell the outdated table for the first time before the analyze. But I don't know how to tell when there are more changes to the same table. Please see my updated question.
  • Beibei
    Beibei almost 13 years
    Ok I've found out how to tell if the statistics of a table are up-to-date when there are rows added to (or removed from) the table. The trick is to compare n_tup_ins (or n_live_tup) in view "pg_catalog.pg_stat_all_tables" with reltuples in table "pg_catalog.pg_class". Although this method can't detect updates when the row count maintains the same, it satisfies my problem.
  • atrain
    atrain almost 13 years
    Can you get in contact with your DBA? Even if its a hosted application, the autovac daemon should be running, as Postgres can get very fragmented without it, especially if you are doing lots of deletes.
  • Sean
    Sean almost 13 years
    Just log all of the queries on the backend and see what happens when you connect with pgAdmin. Please note the above comment from a different poster regarding enabling autovacuum. You should run with autovacuum unless you have some exceptionally exotic needs and know precisely what you're trying to avoid by using autovacuum (chances are you don't and you shouldn't be avoiding autovacuum). If this isn't your decision, please make this case to whomever is the decision maker.