How can I force a subquery to perform as well as a #temp table?

10,699

Solution 1

There are a few possible explanations as to why you see this behavior. Some common ones are

  1. The subquery or CTE may be being repeatedly re-evaluated.
  2. Materialising partial results into a #temp table may force a more optimum join order for that part of the plan by removing some possible options from the equation.
  3. Materialising partial results into a #temp table may improve the rest of the plan by correcting poor cardinality estimates.

The most reliable method is simply to use a #temp table and materialize it yourself.

Failing that regarding point 1 see Provide a hint to force intermediate materialization of CTEs or derived tables. The use of TOP(large_number) ... ORDER BY can often encourage the result to be spooled rather than repeatedly re evaluated.

Even if that works however there are no statistics on the spool.

For points 2 and 3 you would need to analyse why you weren't getting the desired plan. Possibly rewriting the query to use sargable predicates, or updating statistics might get a better plan. Failing that you could try using query hints to get the desired plan.

Solution 2

I do not believe there is a query hint that instructs the engine to spool each subquery in turn.

There is the OPTION (FORCE ORDER) query hint which forces the engine to perform the JOINs in the order specified, which could potentially coax it into achieving that result in some instances. This hint will sometimes result in a more efficient plan for a complex query and the engine keeps insisting on a sub-optimal plan. Of course, the optimizer should usually be trusted to determine the best plan.

Ideally there would be a query hint that would allow you to designate a CTE or subquery as "materialized" or "anonymous temp table", but there is not.

Solution 3

Another option (for future readers of this article) is to use a user-defined function. Multi-statement functions (as described in How to Share Data between Stored Procedures) appear to force the SQL Server to materialize the results of your subquery. In addition, they allow you to specify primary keys and indexes on the resulting table to help the query optimizer. This function can then be used in a select statement as part of your view. For example:

CREATE FUNCTION SalesByStore (@storeid varchar(30))
   RETURNS @t TABLE (title varchar(80) NOT NULL PRIMARY KEY,
                     qty   smallint    NOT NULL)  AS
BEGIN
   INSERT @t (title, qty)
      SELECT t.title, s.qty
      FROM   sales s
      JOIN   titles t ON t.title_id = s.title_id
      WHERE  s.stor_id = @storeid
   RETURN
END

CREATE VIEW SalesData As
SELECT * FROM SalesByStore('6380')
Share:
10,699
Adamantish
Author by

Adamantish

Updated on June 02, 2022

Comments

  • Adamantish
    Adamantish almost 2 years

    I am re-iterating the question asked by Mongus Pong Why would using a temp table be faster than a nested query? which doesn't have an answer that works for me.

    Most of us at some point find that when a nested query reaches a certain complexity it needs to broken into temp tables to keep it performative. It is absurd that this could ever be the most practical way forward and means these processes can no longer be made into a view. And often 3rd party BI apps will only play nicely with views so this is crucial.

    I am convinced there must be a simple queryplan setting to make the engine just spool each subquery in turn, working from the inside out. No second guessing how it can make the subquery more selective (which it sometimes does very successfully) and no possibility of correlated subqueries. Just the stack of data the programmer intended to be returned by the self-contained code between the brackets.

    It is common for me to find that simply changing from a subquery to a #table takes the time from 120 seconds to 5. Essentially the optimiser is making a major mistake somewhere. Sure, there may be very time consuming ways I could coax the optimiser to look at tables in the right order but even this offers no guarantees. I'm not asking for the ideal 2 second execute time here, just the speed that temp tabling offers me within the flexibility of a view.

    I've never posted on here before but I have been writing SQL for years and have read the comments of other experienced people who've also just come to accept this problem and now I would just like the appropriate genius to step forward and say the special hint is X...

  • Adamantish
    Adamantish over 10 years
    That's the answer I feared. Is there some practical reason that the engine might have been built this way? My best guess is that it's a relic from the days that diskspace really mattered. Maybe it does still really matter that views, by nature, not only be guaranteed to make no changes to the db leave a light footprint on temp processing diskspace. It's just particularly annoying when you've already made the view central to other processes.
  • Adamantish
    Adamantish over 10 years
    Yes! That's the charm (the forced intermediate materialisation). Down to 4 seconds. Sure seems like a hacky way to get a very commonly useful thing done but maybe MS will make it a hint or setting later.
  • Martin Smith
    Martin Smith over 10 years
    @Adamantish - Well they haven't closed it as "won't fix" yet which is encouraging from that POV.
  • Adamantish
    Adamantish over 10 years
    And your point about no stats on the spool is worth bearing in mind for when I come up against reasons 2 and 3.
  • Adamantish
    Adamantish over 10 years
    I will probably rarely ever use a temp table again except where I need to reuse it.
  • CodeMonkey
    CodeMonkey over 10 years
    Force order did it for me! Thank you
  • Ed Avis
    Ed Avis over 9 years
    "In addition, they allow you to specify primary keys and indexes on the resulting table to help the query optimizer." Can you clarify this? The linked article doesn't appear to indicate it is possible for user-defined functions to make PKs or indexes.
  • Ed Avis
    Ed Avis over 9 years
    For a large query that uses 'with a as (...), b as (...)' I found that using 'top 1000000 ... order by' did not seem to give the same speedup as just converting the subqueries to temp tables. This on MSSQL 2008 R2.
  • Martin Smith
    Martin Smith over 9 years
    @EdAvis yes, creating temp tables explicitly is much more reliable and also has other potential benefits in terms of allowing you to add needed indexes and better cardinality estimates for the rest of the plan. Messing around with top to try and get this would very much be a plan b for me. E.g. If for some reason the query absolutely had to be in a view so that explicit materialisation was not an option.
  • Martin Smith
    Martin Smith over 9 years
    @EdAvis the result table of multi statement TVFs is just a table variable so the methods here apply stackoverflow.com/a/17385085/73226
  • erdomke
    erdomke over 9 years
    Exactly. In my example, you can see a primary key is specified for the title column of the return table.
  • Ed Avis
    Ed Avis over 9 years
    Sorry, my mistake, I was only looking for 'create' statements :-(
  • erdomke
    erdomke over 9 years
    Not a problem; I have been there.
  • Adamantish
    Adamantish over 9 years
    A good suggestion that will be the answer in many cases. Just wanted to reiterate a warning @MartinSmith gave about this approach in a comment above: "One downside of that though is that it will also assume 1 row will be emitted. So it may mess up cardinality estimates for the rest of the plan."
  • Adamantish
    Adamantish over 9 years
    To me, Views are the fundamental particle of well modularised set-based programming. They hang onto your indexes and give the optimiser its best shot at putting together a good plan out of a bunch of nested objects. Still, we're talking about how to frustrate the optimiser here so the point may be moot.
  • lehiester
    lehiester about 4 years
    I've seen cases with user-defined functions where even the TOP trick doesn't work. Specifically an expensive user-defined function in a joined subquery with one row was obviously being re-evaluated over and over despite the fact that it only needed to be evaluated once. Storing the subquery in a temp table was the only thing that fixed it and made it run in a reasonable amount of time.