Why is there a HUGE performance difference between temp table and subselect

77,119

Solution 1

Why it's not recommended to use subqueries?

Database Optimizer (regardless of what database you are using) can not always properly optimize such query (with subqueries). In this case, the problem to the optimizer is to choose the right way to join result sets. There are several algorithms for joining two result sets. The choice of the algorithm depends on the number of records which are contained in one and in the other result set. In case if you join two physical tables (subquery is not a physical table), the database can easily determine the amount of data in two result sets by the available statistics. If one of result sets is a subquery then to understand how many records it returns is very difficult. In this case the database can choose wrong query plan of join, so that will lead to a dramatic reduction in the performance of the query.

Rewriting the query with using temporary tables is intended to simplify the database optimizer. In the rewritten query all result sets participating in joins will be physical tables and the database will easily determine the length of each result set. This will allow the database to choose the guaranteed fastest of all possible query plans. Moreover, the database will make the right choice no matter what are the conditions. The rewritten query with temporary tables would work well on any database, this is especially important in the development of portable solutions. In addition, the rewritten query is easier to read, easier to understand and to debug.

It is understood that rewriting the query with temporary tables can lead to some slowdown due to additional expenses: creation of temporary tables. If the database will not be mistaken with the choice of the query plan, it will perform the old query faster than a new one. However, this slowdown will always be negligible. Typically the creation of a temporary table takes a few milliseconds. That is, the delay can not have a significant impact on system performance, and usually can be ignored.

Important! Do not forget to create indexes for temporary tables. The index fields should include all fields that are used in join conditions.

Solution 2

There are lot of things to tackle here, indexes, execution plans, etc. Testing and comparing results is the way to go.

You could take a look to the usual suspects, indexes. Take a look into the execution plan and compare them. Make sure the WHERE clause is using the correct ones. Ensure you are using the indexes on your JOINs.

These answers sure will help you a lot.

Share:
77,119

Related videos on Youtube

Ward
Author by

Ward

Updated on August 10, 2020

Comments

  • Ward
    Ward almost 4 years

    This is a question about SQL Server 2008 R2

    I'm not a DBA, by far. I'm a java developer, who has to write SQL from time to time. (mostly embedded in code). I want to know if I did something wrong here, and if so, what I can do to avoid it to happen again.

    Q1:

    SELECT something FROM (SELECT * FROM T1 WHERE condition1) JOIN ...
    

    Q1 features 14 joins

    Q2 is the same as Q1, with one exception. (SELECT * FROM T1 WHERE condition1) is executed before, and stored in a temp table.

    This is not a correlated sub-query.

    Q2:

    SELECT * INTO #tempTable FROM T1 WHERE condition1
    SELECT something FROM #tempTable  JOIN ...
    

    again, 14 joins.

    The thing that puzzles me now is that Q1 took > 2min, (tried it a few times, to avoid caching to play a role) while Q2 (both queries combined) took 2sec!!! What gives?

    • Martin Smith
      Martin Smith about 11 years
      My guess would be that the estimated number of rows for SELECT * FROM T1 WHERE condition1 are highly inaccurate. Materialising it into a #tempTable means that SQL Server knows exactly how many rows will be returned. Can you post the XML version of both actual execution plans?
  • AnandPhadke
    AnandPhadke about 11 years
    SQL Server query engine internally creates the temp tables and the reason you provided above is not always true.IT depends on lot more other factors like Indexes,Fragmentation,Statastics etc.
  • nirupam
    nirupam almost 8 years
    creating indexes on temporary tables increases query performance.
  • Saber
    Saber over 7 years
    You answer is quite misleading and wrong, creating a temp table should only be considered in certain cases: stackoverflow.com/questions/42772428/…
  • Gordon Linoff
    Gordon Linoff over 7 years
    @Arvand . . . This isn't "wrong", although I disagree with the advice. If you read carefully, both Karthik and I are recommending the use of indexes on temporary tables to improve performance. In my experience, the problem is almost always nested loop joins, and these can be avoided with query hints. I find that query hints are much easier to maintain than lots of temporary tables.
  • Saber
    Saber over 7 years
    @GordonLinoff The conclusion from the first and second paragraph is that subqueries should be rewritten with temp tables because: "the database can easily determine the amount of data in two result sets by the available statistics" which is a wrong assumption and can lead to wrong conclusion.