Is WITH the replacement for a #TEMP table?

10,325

Solution 1

No. CTEs--introduced by WITH--don't replace temp tables, although in some cases they can be used where one might have used a temp table in the past.

WITH is really nothing more than a derived table, with the difference that it is introduced before the query instead of inline, and is given an alias which can then be used as a table throughout the query multiple times.

A derived table is a complete query, inside of parentheses, that is used as if it were a real table. Views and table-valued functions are also considered derived tables, but we're focusing on the kind that is defined inline. Here is an example:

SELECT
   C.Name,
   S.SalesTotal
FROM
   dbo.Customer C
   INNER JOIN (
      SELECT
         O.CustomerID,
         SalesTotal = Sum(OrderTotal)
      FROM
         dbo.CustomerOrder O
      GROUP BY
         O.CustomerID
   ) S
      ON C.CustomerID = S.CustomerID;

We have a completely intact query that returns its own rowset (the GROUP BY query). By placing this inside of parentheses and assigning it an alias S, we can now use it like a table. We could join more tables to this one. But, we have only joined to this table once.

To convert this to a CTE, we make a very simple change:

WITH SalesTotals AS (
   SELECT
      O.CustomerID,
      SalesTotal = Sum(OrderTotal)
   FROM
      dbo.CustomerOrder O
   GROUP BY
      O.CustomerID
)
SELECT
   C.Name,
   S.SalesTotal
FROM
   dbo.Customer C
   INNER JOIN SalesTotals S
      ON C.CustomerID = S.CustomerID
   -- and for an example of using the CTE twice:
   INNER JOIN (
      SELECT Avg(SalesTotal)
      FROM SalesTotals
   ) A (AverageSalesTotal)
      ON S.SalesTotal >= A.AverageSalesTotal;

Now, a temp table is a completely different animal. It has very important differences from a CTE or derived table:

  • A temp table persists over many queries (for the lifetime of the client connection, or until explicitly dropped) but a CTE only "exists" for one query.
  • A CTE, while logically a "single" table, is likely to have its data generated multiple times if used multiple times in a query. A temp table's data would simply be read as any other "real" table. In the above example, the Avg(SalesTotal) calculation, in versions of SQL Server through at least 2012, will involve a completely separate operation of performing the SalesTotals aggregate a second time. While it is possible for the engine to materialize the results of the CTE, so far SQL Server has not done this. It is notable that other DBMSes such as Oracle may materialize the results of a CTE. In any case, you should be aware that this double-querying can have (of course!) serious performance implications.
  • A temp table has column statistics automatically generated for it and this can aid the query optimizer in choosing better execution plans. A CTE's "final" rowset has no statistics--the statistics of the underlying tables are used.
  • A temp table can be added to incrementally or have rows deleted from it by multiple or repeated statements. It can be updated.
  • A temp table can be modified to add or remove columns or change data types.
  • A temp table can have clustered and non-clustered indexes and constraints.
  • You cannot use a temp table in any way inside a user-defined function.
  • A CTE, while appearing to logically segregate parts of a query, does no such thing. CTEs are perfect candidates for predicate push-down, elimination if it is determined they do not affect the final rowset (or some of their tables or joins eliminated), or they may be subject to unexpected expression evaluation order. For example, in a CTE you might return only the numeric strings from a text column, and in the outer query try to convert these strings to a numeric data type, but to your surprise you get an error about attempting to convert non-number strings to a numeric data type. That is because the optimizer is free to reorganize your query in any way it pleases, and may do the conversion to numeric before the filter for number-containing strings. A temp table, while requiring two statements (one to insert the data, and a second to join to that data) would not have this problem as the queries are distinct and the data truly "materialized" as expected before using it.

Last, a CTE can do something a temp table cannot: it can be recursive. In Oracle this is expressed through CONNECT BY, and in SQL Server it is done with a UNION ALL SELECT inside the CTE that is allowed to refer to the CTE's own alias.

Be careful with CTEs--they are a great abstraction, but are nothing more than that, and you can run into serious trouble with them. Generating a million rows can be done with a recursive CTE one row at a time, but it's the WORST possible way by like a hundred times over or more.

There is another special kind of temp table in SQL Server 2005 and up called a "table variable" that is very much like a temp table (and kept in tempdb exactly the same), with a few notable exceptions:

  • It only lasts the duration of the batch, not the connection.
  • You can use a table variable inside a user-defined function. Some types of UDFs require one.
  • It can only have inline constraints declared (such as primary keys or uniqueness), and while it can have rows updated/inserted/deleted, its schema cannot be modified after declaration in any way, so no adding/removing columns, changing data types, or adding indexes.
  • It does not collect statistics.
  • It can be passed as a parameter (table-valued parameter) in SQL Server 2008 and up.

Solution 2

The SQL optimizer does a much better job today at choosing great execution plans, but when joining more than 10 tables, especially with some large tables and views and needing to use multiple filters, it often does not perform optimally. I still find there is nothing as fast as using #TEMP tables and breaking the queries down into much smaller subsets before joining them together. NOTE: it is rare that I find that adding indexes to the #TEMP tables improves performance.

Share:
10,325

Related videos on Youtube

Mike Perrenoud
Author by

Mike Perrenoud

Mike Perrenoud is a 19-year veteran developer, polyglot, mentor and all around nice guy. He enjoys helping people and making a difference in their lives.

Updated on June 21, 2022

Comments

  • Mike Perrenoud
    Mike Perrenoud about 2 years

    So based off the way I've seen WITH used, and the documentation at MSDN:

    Specifies a temporary named result set, known as a common table expression (CTE).

    it appears that the WITH is the replacement for #TEMP tables. Is that correct?

    • João Paladini
      João Paladini over 11 years
      CTEs (WITH..) are really more like temporary views than tables. But they are really just sub-queries that you can alias (i.e., assign a name to them)