Inner Join with derived table using sub query

sql-server sql-server-2008 join subquery

129,196

Solution 1

The first thing to note is that your queries are not comparable, OUTER APPLY needs to be replaced with CROSS APPLY, or INNER JOIN with LEFT JOIN.

When they are made comparable though, you can see that the query plans for both queries are identical. I have just mocked up a sample DDL:

CREATE TABLE #Employees (EID INT NOT NULL);
INSERT #Employees VALUES (0);
CREATE TABLE #SalaryRate (EID INT NOT NULL, Rate MONEY NOT NULL);
CREATE TABLE #AttendanceDetails (EID INT NOT NULL, DaysAttended INT NOT NULL);

Running the following:

SELECT E.EID,DT.Salary FROM #Employees E
OUTER APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT; --Derived Table for outer apply

SELECT E.EID,DT.Salary FROM #Employees E
LEFT JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

Gives the following plan:

enter image description here

And changing to INNER/CROSS:

SELECT E.EID,DT.Salary FROM #Employees E
CROSS APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT; --Derived Table for outer apply


SELECT E.EID,DT.Salary FROM #Employees E
INNER JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

Gives the following plan:

enter image description here

These are the plans where there is no data in the outer tables, and only one row in employees, so not really realistic. In the case of the outer apply, SQL Server is able to determine that there is only one row in employees, so it would be beneficial to just do a nested loop join (i.e. row by row lookup) to the outer tables. After putting 1,000 rows in employees, using LEFT JOIN/OUTER APPLY yields the following plan:

enter image description here

You can see here that the join is now a hash match join, which means (in it's simplest terms) that SQL Server has determined that the best plan is to execute the outer query first, hash the results and then lookup from employees. This however does not mean that the subquery as a whole is executed and the results stored, for simplicity purposes you could consider this, but predicates from the outer query can still be still be used, for example, if the subquery were executed and stored internally, the following query would present massive overhead:

SELECT E.EID,DT.Salary FROM #Employees E
LEFT JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID
WHERE E.EID = 1;

What whould be the point in retrieving all employee rates, storing the results, only to actually look up one employee? Inspection of the execution plan shows that the EID = 1 predicate is passed to the table scan on #AttendanceDetails:

enter image description here

So the answer to the following points is:

If I re-write above query using OUTER APPLY, I know for sure the subquery will be executed for each row.
Inner Join will execute sub query only once.

It depends. Using APPLY SQL Server will attempt to rewrite the query as a JOIN if possible, as this will yield the optimal plan, so using OUTER APPLY does not guarantee that the query will be executed once for each row. Similarly using LEFT JOIN does not guarantee that the query is executed only once.

SQL is a declarative language, in that you tell it what you want it to do, not how to do it, so you shouldn't rely on specific commands to elicit specific behaviour, instead, if you find performance issues, check the execution plan, and IO statistics to find out how it is doing it, and identify how you can improve your query.

Further more, SQL Server does not matierialise subqueries, usually the definition is expanded out into the main query, so even though you have written:

SELECT E.EID,DT.Salary FROM #Employees E
INNER JOIN
(
    SELECT SR.EID, (SR.Rate * AD.DaysAttended) Salary
    FROM #SalaryRate SR
    INNER JOIN #AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID;

What is actually executed is more like:

SELECT  e.EID, sr.Rate * ad.DaysAttended AS Salary
FROM    #Employees e
        INNER JOIN #SalaryRate sr
            on e.EID = sr.EID
        INNER JOIN #AttendanceDetails ad
            ON ad.EID = sr.EID;

Solution 2

With INNER JOIN your Sub-Query will be execute only once and its records may gets stored internally in tempdb worktable on complex operations, then JOINed with the 1st table.

With APPLY clause, the Sub-Query will be executed for every row in the 1st table.

edit: use CTE

;with SalaryRateCTE as 
(
    SELECT EID, (SR.Rate * AD.DaysAttended) AS Salary
    FROM SalaryRate SR
    INNER JOIN AttendanceDetails AD on AD.EID=SR.EID
)
SELECT E.EID, DT.Salary 
FROM Employees E
INNER JOIN SalaryRateCTE DT --Derived Table for inner join
ON DT.EID = E.EID

Solution 3

The sub-query will be evaluated only once. To avoid confusion, we could simply think of the sub-query as a single table/view since the outer and inner inner queries are not co-related.

129,196

love kumar

Updated on February 14, 2020

Comments

love kumar over 4 years
Environment: SQL 2008 R2

I created a derived table using sub query and joined with main table. I just like to know if subquery is executed only once or will it be executed for each row in result set. Consider following example (fictional table names for reference only)
```
SELECT E.EID,DT.Salary FROM Employees E
INNER JOIN
(
    SELECT EID, (SR.Rate * AD.DaysAttended) Salary
    FROM SalaryRate SR
    INNER JOIN AttendanceDetails AD on AD.EID=SR.EID
) DT --Derived Table for inner join
ON DT.EID=E.EID
```
So, the subquery used for Inner Join will be executed only once or multiple times??

If I re-write above query using OUTER APPLY, I know for sure the subquery will be executed for each row. See Below.
```
SELECT E.EID,DT.Salary FROM Employees E
OUTER APPLY
(
    SELECT (SR.Rate * AD.DaysAttended) Salary
    FROM SalaryRate SR
    INNER JOIN AttendanceDetails AD on AD.EID=SR.EID
    WHERE SR.EID=E.EID
) DT --Derived Table for outer apply
```
So just want to ensure that Inner Join will execute sub query only once.
anonxen over 9 years

2 Points about the outer apply query: 1. It is not equivalent to Inner join. But in some ways similar to Outer join since it gives the extra rows not matching the condition. 2. It will be executed for each row in the outer query.
GarethD over 9 years

This is not true. Internally SQL will rewrite the apply in the question as a JOIN and it will be executed according to what is the best plan based on the statistics and indexes available. In addition SQL Server does not matierialise subqueries, so it will not be execute once and store the results.
GarethD over 9 years

SQL Server does not matierialise subqueries, so it will not be execute once and store the results. The subquery definition is expanded out into the main query to allow for further optimisation.
love kumar over 9 years

Thanks for detailed explanation. So basically you are saying, we cannot determine how the sub query will be handled. I was under impression that sub-queries will always materialize first.
Manoj Pandey over 9 years

I'm talking in general terms about the behavior of APPLY clause, and what actually should be used out of these two, and should not be left to SQL optimizer to guess. It will execute the Sub-Query multiple times in most of the cases as APPLY clause was intended to be designed in such a way, for example if you include TOP 5 with SELECT in Sub-Query. But yes in this case it would execute once, but APPLY should not be used here. Also regarding Sub-Query if its a complex query then SQL optimizer creates a WorkTable in tempDB internally to spool these records. Hope this helps and avoid confusion.
love kumar over 9 years

My subquery is actually a complex subquery which involves views and row number functions to skip duplicates. Reason for making it a subquery/APPLY is for better readability for future developers. But wasn't sure if entire subquery is executed for every row
GarethD over 9 years

With regard to not being able to determine how a subquery will be handled, yes, we can't know, we can have a pretty good guess, but ultimately it is up to the optimiser to decide the best way to execute the query, and 99.9% of the time (maybe more) it will be the best way. With regard to materialising subqueries, No, parts of a query may be materialised (whether in a subquery or not) and the results spooled, but again, it is up to the optimiser to decide whether this is the best option or not.
Manoj Pandey over 9 years

What is you main aim? If you want to just JOIN both the table & the Sub-Query then just use JOIN, not APPLY clause. Using APPLY will not make your code look beautiful, but may confuse other developers. TO make it look beautiful, use CTE to store the Sub-Query, see my response above.
GarethD over 9 years

For what it is worth there is a connect item open requesting a query hint that will force materialisation of subqueries/ctes.
GarethD over 9 years

Using a CTE is no different to a subquery, and will not force the materisiation. Since the subquery definition is expanded into the main query, the complexity of the subquery is irrelevant to spooling. You can get spooling with no subqueries at all. The point is SQL is declarative, and although you can have a pretty good idea whether a query plan will use a merge, hash or nested loop join, unless you specifically tell it to then you cannot rely on a certain command to yield a certain plan. As shown in my answer, exactly the same query yields two different plans when different data is queried.
Manoj Pandey over 9 years

@GarethD, I think you didn't get my point, or overlooked the response above. I mentioned use of CTE for better readability, not for optimization. And I very well understand how the SQL optimizer uses Hash/NestedLoop/Merge join internally. If you use "SET STATISTICS IO ON" statement it will show using WorkTable at times for a complex query thus may go for spooling/caching in tempdb. My main point is on usage of APPLY operator on what circumstances it should be used, and not to be used instead of JOINs.
anonxen over 9 years

Voted up your answer. I think this is the most accurate answer to the question.
anonxen over 9 years

Thanks for pointing this out ! I will always consider this while writing my future queries.