why does my view in postgresql not use the index?

11,067

As far as I can see it's a limitation in postgres - it's hard to make it avoid scanning the whole table on a union in this way.

See:

https://www.postgresql-archive.org/Poor-plan-when-joining-against-a-union-containing-a-join-td5747690.html

and

https://www.postgresql-archive.org/Pushing-IN-subquery-down-through-UNION-ALL-td3398684.html

and also maybe related

https://dba.stackexchange.com/questions/47572/in-postgresql-9-3-union-view-with-where-clause-not-taken-into-account

Basically - I guess you need to revisit your view definition! Sorry for no definitive solution.

Share:
11,067
Greg Hennessy
Author by

Greg Hennessy

Astronomer, computer programmer, geek.

Updated on July 26, 2022

Comments

  • Greg Hennessy
    Greg Hennessy almost 2 years

    I have a large table (star catalog), of which I have a subset. I implement the subset as a union of two tables, where I make use of a cross index.

    The issue is that a query from the view doesn't seem to be using the index, the time takes the same as a scan through the table.

    A query against the large table goes quickly:

    select count(*) from ucac4 where rnm in (select ucac4_rnm from grid_catalog limit 5);
    count 
    -------
         5
    (1 row)
    
    Time: 12.132 ms
    

    A query against the view does not go quickly, even though I would expect it to.

    select count(*) from grid_catalog_view where ident in (select ucac4_rnm from grid_catalog limit 5);
    count 
    -------
         5
    (1 row)
    
    Time: 1056237.045 ms
    

    An explain of this query yeilds:

     Aggregate  (cost=23175810.51..23175810.52 rows=1 width=0)
       ->  Hash Join  (cost=23081888.41..23172893.67 rows=1166734 width=0)
             Hash Cond: (ucac4.rnm = public.grid_catalog.ucac4_rnm)
             ->  Unique  (cost=23081888.17..23140224.87 rows=2333468 width=44)
                   ->  Sort  (cost=23081888.17..23087721.84 rows=2333468 width=44)
                         Sort Key: ucac4.ra, ucac4."dec", ucac4.pmrac, ucac4.pmdc, ucac4.rnm, ucac4.nest4, ucac4.nest6, ucac4.nest7, public.grid_catalog.subset
                         ->  Append  (cost=63349.87..22763295.24 rows=2333468 width=44)
                               ->  Hash Join  (cost=63349.87..22738772.75 rows=2333467 width=44)
                                     Hash Cond: (ucac4.rnm = public.grid_catalog.ucac4_rnm)
                                     ->  Seq Scan on ucac4  (cost=0.00..16394129.04 rows=455124304 width=40)
                                     ->  Hash  (cost=34048.69..34048.69 rows=2344094 width=8)
                                           ->  Seq Scan on grid_catalog  (cost=0.00..34048.69 rows=2344094 width=8)
                                                 Filter: (petrov_prikey IS NULL)
                               ->  Hash Join  (cost=415.51..1187.80 rows=1 width=36)
                                     Hash Cond: (petrov.prikey = public.grid_catalog.petrov_prikey)
                                     ->  Seq Scan on petrov  (cost=0.00..709.15 rows=7215 width=32)
                                     ->  Hash  (cost=282.08..282.08 rows=10675 width=8)
                                           ->  Index Scan using grid_catalog_petrov_prikey_idx on grid_catalog  (cost=0.00..282.08 row
    s=10675 width=8)
             ->  Hash  (cost=0.18..0.18 rows=5 width=4)
                   ->  HashAggregate  (cost=0.13..0.18 rows=5 width=4)
                         ->  Limit  (cost=0.00..0.07 rows=5 width=4)
                               ->  Seq Scan on grid_catalog  (cost=0.00..34048.69 rows=2354769 width=4)
    (22 rows)
    

    The explain analyze (request in a comment) is:

     Aggregate  (cost=23175810.51..23175810.52 rows=1 width=0) (actual time=1625067.627..1625067.628 rows=1 loops=1)
       ->  Hash Join  (cost=23081888.41..23172893.67 rows=1166734 width=0) (actual time=1621395.200..1625067.618 rows=5 loops=1)
             Hash Cond: (ucac4.rnm = public.grid_catalog.ucac4_rnm)
             ->  Unique  (cost=23081888.17..23140224.87 rows=2333468 width=44) (actual time=1620897.932..1624102.849 rows=1597359 loops
    =1)
                   ->  Sort  (cost=23081888.17..23087721.84 rows=2333468 width=44) (actual time=1620897.928..1622191.358 rows=1597359 l
    oops=1)
                         Sort Key: ucac4.ra, ucac4."dec", ucac4.pmrac, ucac4.pmdc, ucac4.rnm, ucac4.nest4, ucac4.nest6, ucac4.nest7, pu
    blic.grid_catalog.subset
                         Sort Method:  external merge  Disk: 87536kB
                         ->  Append  (cost=63349.87..22763295.24 rows=2333468 width=44) (actual time=890293.619..1613769.160 rows=15973
    59 loops=1)
                               ->  Hash Join  (cost=63349.87..22738772.75 rows=2333467 width=44) (actual time=890293.617..1611550.313 r
    ows=1590144 loops=1)
                                     Hash Cond: (ucac4.rnm = public.grid_catalog.ucac4_rnm)
                                     ->  Seq Scan on ucac4  (cost=0.00..16394129.04 rows=455124304 width=40) (actual time=886086.630..1
    359934.589 rows=113780093 loops=1)
                                     ->  Hash  (cost=34048.69..34048.69 rows=2344094 width=8) (actual time=4203.785..4203.785 rows=1590
    144 loops=1)
                                           ->  Seq Scan on grid_catalog  (cost=0.00..34048.69 rows=2344094 width=8) (actual time=0.014.
    .2813.031 rows=1590144 loops=1)
                                                 Filter: (petrov_prikey IS NULL)
                               ->  Hash Join  (cost=415.51..1187.80 rows=1 width=36) (actual time=101.604..165.749 rows=7215 loops=1)
                                     Hash Cond: (petrov.prikey = public.grid_catalog.petrov_prikey)
                                     ->  Seq Scan on petrov  (cost=0.00..709.15 rows=7215 width=32) (actual time=58.280..108.043 rows=7
    215 loops=1)
                                     ->  Hash  (cost=282.08..282.08 rows=10675 width=8) (actual time=43.276..43.276 rows=7215 loops=1)
                                           ->  Index Scan using grid_catalog_petrov_prikey_idx on grid_catalog  (cost=0.00..282.08 rows
    =10675 width=8) (actual time=19.387..37.533 rows=7215 loops=1)
             ->  Hash  (cost=0.18..0.18 rows=5 width=4) (actual time=0.035..0.035 rows=5 loops=1)
                   ->  HashAggregate  (cost=0.13..0.18 rows=5 width=4) (actual time=0.026..0.030 rows=5 loops=1)
                         ->  Limit  (cost=0.00..0.07 rows=5 width=4) (actual time=0.009..0.017 rows=5 loops=1)
                               ->  Seq Scan on grid_catalog  (cost=0.00..34048.69 rows=2354769 width=4) (actual time=0.007..0.009 rows=
    5 loops=1)
     Total runtime: 1625108.504 ms
    (24 rows)
    
    Time: 1625466.830 ms
    

    To see the time to scan through the view:

    select count(*) from grid_catalog_view;
      count  
    ---------
     1597359
    (1 row)
    
    Time: 1033732.786 ms
    

    My view is defined as:

    PS1=# \d grid_catalog_view
        View "public.grid_catalog_view"
     Column |       Type       | Modifiers 
    --------+------------------+-----------
     ra     | double precision | 
     dec    | double precision | 
     pmrac  | integer          | 
     pmdc   | integer          | 
     ident  | integer          | 
     nest4  | integer          | 
     nest6  | integer          | 
     nest7  | integer          | 
     subset | integer          | 
    View definition:
             SELECT ucac4.ra, ucac4."dec", ucac4.pmrac, ucac4.pmdc, ucac4.rnm AS ident, ucac4.nest4, ucac4.nest6, ucac4.nest7, grid_catalog.subset
               FROM ucac4, grid_catalog
              WHERE ucac4.rnm = grid_catalog.ucac4_rnm AND grid_catalog.petrov_prikey IS NULL
    UNION 
             SELECT petrov.ra, petrov."dec", 0 AS pmrac, 0 AS pmdc, grid_catalog.petrov_prikey AS ident, petrov.nest4, petrov.nest6, petrov.nest7, grid_catalog.subset
               FROM petrov, grid_catalog
              WHERE petrov.prikey = grid_catalog.petrov_prikey AND grid_catalog.ucac4_rnm IS NULL;
    

    The large table is defined as:

    PS1=# \d ucac4
              Table "public.ucac4"
      Column  |       Type       | Modifiers 
    ----------+------------------+-----------
     radi     | bigint           | 
     spdi     | bigint           | 
     magm     | smallint         | 
     maga     | smallint         | 
     sigmag   | smallint         | 
     objt     | smallint         | 
     cdf      | smallint         |
    ... deleted entries not of relavance ...
     ra       | double precision | 
     dec      | double precision | 
     x        | double precision | 
     y        | double precision | 
     z        | double precision | 
     nest4    | integer          | 
     nest6    | integer          | 
     nest7    | integer          | 
    Indexes:
        "ucac4_pkey" PRIMARY KEY, btree (rnm)
        "q3c_ucac4_idx" btree (q3c_ang2ipix(ra, "dec")) CLUSTER
        "ucac4_nest4_idx" btree (nest4)
        "ucac4_nest6_idx" btree (nest6)
        "ucac4_nest7_idx" btree (nest7)
    Referenced by:
        TABLE "grid_catalog" CONSTRAINT "grid_catalog_ucac4_rnm_fkey" FOREIGN KEY (ucac4_rnm) REFERENCES ucac4(rnm)
    

    Any idea why my index doesn't seem to be used?

  • bubakazouba
    bubakazouba over 6 years
    first 2 links are broken
  • sHtev
    sHtev almost 5 years
    Fixed the links - that site had replaced the generic nabble domain name with a custom one, as in the updated links.