PostgreSQL slow JOIN with CASE statement
Test this one:
select *
from
some_table as t1
join
some_table as t2 on
t1.type = t2.type
and
(
t1.type = 'ab' and t1.first = t2.first
or
t1.type = 'cd' and t1.second = t2.second
)
For a better performance create an index based on a function:
create or replace function f (_type text, _first int, _second int)
returns integer as $$
select case _type when 'ab' then _first else _second end;
$$ language sql immutable;
create index i on some_table(f(type, first, second));
Use that index on the query:
select *
from
some_table as t1
join
some_table as t2 on
t1.type = t2.type
and
f(t1.type, t1.first, t1.second) = f(t1.type, t2.first, t2.second)
legacy
Updated on June 04, 2022Comments
-
legacy almost 2 years
In my database I have a table that contains ~3500 records and as a part of more complicated query I've tried to perform inner join on itself using "CASE" condition just as you can see below.
SELECT * FROM some_table AS t1 JOIN some_table AS t2 ON t1.type = t2.type AND CASE WHEN t1.type = 'ab' THEN t1.first = t2.first WHEN t1.type = 'cd' THEN t1.second = t2.second -- Column type contains only one of 2 possible varchar values END;
The problem is this query is performed for 3.2 - 4.5 seconds while next request is performed in 40 - 50 milliseconds.
SELECT * FROM some_table AS t1 JOIN some_table AS t2 ON t1.type = t2.type AND (t1.first = t2.first OR t1.second = t2.second)
Also according to the execution plan in first case database processes ~5.8 millions of records while table contains only ~3500. There are next indexes on this table: (id), (type), (type, first), (type, second).
We are using next version: PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120 313 (Red Hat 4.4.7-16), 64-bit
Any ideas why PostgreSQL works so weird in this case?
-
legacy over 7 yearsIt works just fine, thanks a lot. No ideas why query with CASE performs so slowly?
-
Pavel Stehule over 7 years@legacy - the CASE in JOIN condition is pretty big trap for optimizer - probably Postgres's planner selected nested loop based plan - and your example is Cartesian product 3500*3500 rows, and you have to evaluate CASE expression for every combination - so 5sec is pretty fine time for pretty unhappy query.