SQL Server IN vs. EXISTS Performance

152,547

Solution 1

EXISTS will be faster because once the engine has found a hit, it will quit looking as the condition has proved true.

With IN, it will collect all the results from the sub-query before further processing.

Solution 2

The accepted answer is shortsighted and the question a bit loose in that:

1) Neither explicitly mention whether a covering index is present in the left, right, or both sides.

2) Neither takes into account the size of input left side set and input right side set.
(The question just mentions an overall large result set).

I believe the optimizer is smart enough to convert between "in" vs "exists" when there is a significant cost difference due to (1) and (2), otherwise it may just be used as a hint (e.g. exists to encourage use of an a seekable index on the right side).

Both forms can be converted to join forms internally, have the join order reversed, and run as loop, hash or merge--based on the estimated row counts (left and right) and index existence in left, right, or both sides.

Solution 3

I've done some testing on SQL Server 2005 and 2008, and on both the EXISTS and the IN come back with the exact same actual execution plan, as other have stated. The Optimizer is optimal. :)

Something to be aware of though, EXISTS, IN, and JOIN can sometimes return different results if you don't phrase your query just right: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

Solution 4

I'd go with EXISTS over IN, see below link:

SQL Server: JOIN vs IN vs EXISTS - the logical difference

There is a common misconception that IN behaves equally to EXISTS or JOIN in terms of returned results. This is simply not true.

IN: Returns true if a specified value matches any value in a subquery or a list.

Exists: Returns true if a subquery contains any rows.

Join: Joins 2 resultsets on the joining column.

Blog credit: https://stackoverflow.com/users/31345/mladen-prajdic

Solution 5

There are many misleading answers answers here, including the highly upvoted one (although I don't believe their ops meant harm). The short answer is: These are the same.

There are many keywords in the (T-)SQL language, but in the end, the only thing that really happens on the hardware is the operations as seen in the execution query plan.

The relational (maths theory) operation we do when we invoke [NOT] IN and [NOT] EXISTS is the semi join (anti-join when using NOT). It is not a coincidence that the corresponding sql-server operations have the same name. There is no operation that mentions IN or EXISTS anywhere - only (anti-)semi joins. Thus, there is no way that a logically-equivalent IN vs EXISTS choice could affect performance because there is one and only way, the (anti)semi join execution operation, to get their results.

An example:

Query 1 ( plan )

select * from dt where dt.customer in (select c.code from customer c where c.active=0)

Query 2 ( plan )

select * from dt where exists (select 1 from customer c where c.code=dt.customer and c.active=0)
Share:
152,547
Randy Minder
Author by

Randy Minder

Azure Data Architect / Business Intelligence / Tabular Modeling / Power BI Significant experience with data architecture (database architecture, data warehouses, data marts), SSIS and Azure Data Factory, Business Intelligence with Power BI. Check out my new course on Udemy titled "The DAX Workshop Part 1". The best way to learn DAX is by working through real-world scenarios. The course is filled with exercises (45) to help you learn how to use DAX. Our careers and hobbies are fun and important. But each of us has a soul which will live forever, after our bodies die. Do you know where you'll spend eternity? Jesus Christ said there is only one way to heaven, and it is through Him. You won't get a second chance after you die.

Updated on January 22, 2021

Comments

  • Randy Minder
    Randy Minder over 3 years

    I'm curious which of the following below would be more efficient?

    I've always been a bit cautious about using IN because I believe SQL Server turns the result set into a big IF statement. For a large result set, this could result in poor performance. For small result sets, I'm not sure either is preferable. For large result sets, wouldn't EXISTS be more efficient?

    WHERE EXISTS (SELECT * FROM Base WHERE bx.BoxID = Base.BoxID AND [Rank] = 2)
    

    vs.

    WHERE bx.BoxID IN (SELECT BoxID FROM Base WHERE [Rank = 2])
    
  • Adir D
    Adir D over 14 years
    This used to be true but in current versions (at least 2008) the optimizer is much smarter... it actually treats IN () just like an EXISTS ().
  • Scott Coates
    Scott Coates about 13 years
    @Aaron - yes, typically the optimzer will internally produce a better plan. However, relying on internal shortcuts could be detrimental in more complex scenarios.
  • Josh Lewis
    Josh Lewis about 10 years
    I'd always be extremely cautious in using LEFT JOIN + NULL coding, because it is very easy to get missed or skewed results if you aren't very careful in your NULL handling. I've very rarely found a situation where EXISTS or a CTE ( for finding duplication, or synthetic insertion for missing data) , doesn't both meet the same requirements and outperform the LEFT JOIN + NULL
  • techdude
    techdude over 9 years
    Strange. While the query plan is exactly the same for both for one of my queries, the explain shows a rows number of 972 for IN and 959 for EXIST. All else is the same. This is on Mysql 5.1.42 though, so it may be just old.
  • Karl Kieninger
    Karl Kieninger about 9 years
    TOP 1 should be complete extraneous (or event redundant) when used with EXISTS. EXISTS always returns as soon it find any matching row.
  • DaFi4
    DaFi4 about 8 years
    I did not see any performance benefit with this approach so far. Please show some screenshots of the Execution Plans
  • Magnus
    Magnus about 8 years
    This is just simply wrong. It was in 2010 and still is.
  • Nelssen
    Nelssen over 6 years
    IN and EXISTS have the exact same query plan, and IO. There is no reason for thinking they are different in performance. check your time statistics and comprove yourself
  • UnhandledExcepSean
    UnhandledExcepSean over 5 years
    Have you tested it? If so, can you share your SQL and your results?
  • George Menoutis
    George Menoutis over 5 years
    Tested it multiple times. I can create another test case, and I will, but a test case does not mean that the optimizer will do the exact same plan on tables with different statistics. This might lead someone to think the answer is partial - but the nonexistence of multiple semijoin operators is fact. Maybe I'll find a list somewhere and link it.
  • Martin Smith
    Martin Smith about 5 years
    The optimiser always gives the same plan for IN and EXISTS. Try and come up with any case where they don't get the same plan (though this does not apply to NOT IN and NOT EXISTS)
  • MarredCheese
    MarredCheese about 5 years
    @MartinSmith I assume you know what you are talking about, but do you have any proof that the plans are always the same? If so, it'd clear up the decade-long disagreement here.
  • Martin Smith
    Martin Smith about 5 years
    @MarredCheese - the onus is on the people that claim that it is different to produce a single example of this
  • akostadinov
    akostadinov over 2 years
    this, when I read how good is the optimizer, I'm always thinking that in exactly my use case it will fail ¯_(ツ)_/¯