Performance difference: condition placed at INNER JOIN vs WHERE clause
The reason that you're seeing a difference is due to the execution plan that the planner is putting together, this is obviously different depending on the query (arguably, it should be optimising the 2 queries to be the same and this may be a bug). This means that the planner thinks it has to work in a particular way to get to the result in each statement.
When you do it within the JOIN, the planner will probably have to select from the table, filter by the "True" part, then join the result sets. I would imagine this is a large table, and therefore a lot of data to look through, and it can't use the indexes as efficiently.
I suspect that if you do it in a WHERE clause, the planner is choosing a route that is more efficient (ie. either index based, or pre filtered dataset).
You could probably make the join work as fast (if not faster) by adding an index on the two columns (not sure if included columns and multiple column indexes are supported on Postgres yet).
In short, the planner is the problem it is choosing 2 different routes to get to the result sets, and one of those is not as efficient as the other. It's impossible for us to know what the reasons are without the full table information and the EXPLAIN ANALYZE information.
If you want specifics on why your specific query is doing this, you'll need to provide more information. However the reason is the planner choosing different routes.
Additional Reading Material:
http://www.postgresql.org/docs/current/static/explicit-joins.html
Just skimmed, seems that the postgres planner doesn't re-order joins to optimise it. try changing the order of the joins in your statement to see if you then get the same performance... just a thought.
Insectatorious
Character-based Transfer Learning for Sentiment Analysis - used character-level ConvNet to learn sentiment in product reviews and use extracted features to train a new classifier for Twitter sentiment I have: ● Used TensorFlow for sentiment analysis using a character-based ConvNet. ● Used Scala and Deeplearning4j for k-NN analysis of legislation in the Companies Act 2006. Sentence vectors were calculated using the TF-IDF weighted Word2Vec score of each word in the sentence. ● Used Git with Stash in 2 week sprints with Kanban boards and daily stand-ups. ● Used sprint retrospectives to focus research and bring it in line with project expectations.
Updated on June 02, 2020Comments
-
Insectatorious almost 4 years
Say I have a table
order
asid | clientid | type | amount | itemid | date ---|----------|------|--------|--------|----------- 23 | 258 | B | 150 | 14 | 2012-04-03 24 | 258 | S | 69 | 14 | 2012-04-03 25 | 301 | S | 10 | 20 | 2012-04-03 26 | 327 | B | 54 | 156 | 2012-04-04
clientid
is a foreign-key back to theclient
tableitemid
is a foreign key back to anitem
tabletype
is onlyB
orS
amount
is an integer
and a table
processed
asid | orderid | processed | date ---|---------|-----------|--------- 41 | 23 | true | 2012-04-03 42 | 24 | true | 2012-04-03 43 | 25 | false | <NULL> 44 | 26 | true | 2012-04-05
I need to get all the rows from
order
that for the sameclientid
on the samedate
have opposingtype
values. Keep in mindtype
can only have one of two values -B
orS
. In the example above this would be rows23
and24
.The other constraint is that the corresponding row in
processed
must betrue
for theorderid
.My query so far
SELECT c1.clientid, c1.date, c1.type, c1.itemid, c1.amount, c2.date, c2.type, c2.itemid, c2.amount FROM order c1 INNER JOIN order c2 ON c1.itemid = c2.itemid AND c1.date = c2.date AND c1.clientid = c2.clientid AND c1.type <> c2.type AND c1.id < c2.id INNER JOIN processed p1 ON p1.orderid = c1.id AND p1.processed = true INNER JOIN processed p2 ON p2.orderid = c2.id AND p2.processed = true
QUESTION: Keeping the
processed = true
as part of the join clause is slowing the query down. If I move it to the WHERE clause then the performance is much better. This has piqued my interest and I'd like to know why.The primary keys and respective foreign key columns are indexed while the value columns (
value
,processed
etc) aren't.Disclaimer: I have inherited this DB structure and the performance difference is roughly 6 seconds.
-
Insectatorious almost 12 yearsRight....makes sense...the trouble is I've simplified the tables and their respective structures to post this question..I'll try and get the
explain analyse
-
ypercubeᵀᴹ almost 12 yearsYou do not force the query planner by putting conditions in the
ON
or theWHERE
clause. A decent optimizer/query planner should be able to identify both versions as equivalent (when they are) and choose from various execution plans. -
Cade Roux almost 12 years@ypercube Optimizer would normally push them down in as low as possible to reduce the cardinality as soon as possible, but obviously that is not good when it results in a table op instead of an index op. And then perhaps it's not smart enough to pull it up and use it later when the working set is smaller. What's most interesting is that the optimizer doesn't push around the clauses in the WHERE version to be the same.
-
ypercubeᵀᴹ almost 12 years@CadeRoux: Yeah but I think Postgres is mature enough to do that. What may confuse the optimizer is that it has to join 4 tables (so quite a lot of plans there) and only a few indexes. If there were useful indexes, I think it would choose same plans in both cases.
-
Martin almost 12 yearsMaybe "Force" isn't the right word, however, the concept is correct. Maybe "Tell" is the word, but this is meant to be descriptive to people who are not familiar with planners. By doing what he's doing (JOIN vs WHERE) the planner is taking another path, and therefore there is a difference in performance.
-
ypercubeᵀᴹ almost 12 years"Force" is not the right word and neither is "have to". It's not "Tell" either. An SQL statement/query does not tell the DBMS how to do something, it tells what to do. If one wants to give hints of force the optimizer to do things a specific way, there are ways too (that vary from DBMS to DBMS). But the query has no such hints.
-
ypercubeᵀᴹ almost 12 years@Martin: What you are right, is that the optimizer is choosing different (execution) paths/plans.
-
Martin almost 12 yearsWe could go into technical detail about how planners work, however, that is way beyond the scope of the question. The answer gives the OP, and someone not familar with planners, the ability to understand why it is happening without providing meaningless information.
-
Martin almost 12 years@ypercube: I've updated the answer to move away from those words, I think that covers the question that was asked, do you agree?