When should I use CROSS APPLY over INNER JOIN?
Solution 1
Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?
See the article in my blog for detailed performance comparison:
CROSS APPLY
works better on things that have no simple JOIN
condition.
This one selects 3
last records from t2
for each record from t1
:
SELECT t1.*, t2o.*
FROM t1
CROSS APPLY
(
SELECT TOP 3 *
FROM t2
WHERE t2.t1_id = t1.id
ORDER BY
t2.rank DESC
) t2o
It cannot be easily formulated with an INNER JOIN
condition.
You could probably do something like that using CTE
's and window function:
WITH t2o AS
(
SELECT t2.*, ROW_NUMBER() OVER (PARTITION BY t1_id ORDER BY rank) AS rn
FROM t2
)
SELECT t1.*, t2o.*
FROM t1
INNER JOIN
t2o
ON t2o.t1_id = t1.id
AND t2o.rn <= 3
, but this is less readable and probably less efficient.
Update:
Just checked.
master
is a table of about 20,000,000
records with a PRIMARY KEY
on id
.
This query:
WITH q AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM master
),
t AS
(
SELECT 1 AS id
UNION ALL
SELECT 2
)
SELECT *
FROM t
JOIN q
ON q.rn <= t.id
runs for almost 30
seconds, while this one:
WITH t AS
(
SELECT 1 AS id
UNION ALL
SELECT 2
)
SELECT *
FROM t
CROSS APPLY
(
SELECT TOP (t.id) m.*
FROM master m
ORDER BY
id
) q
is instant.
Solution 2
Consider you have two tables.
MASTER TABLE
x------x--------------------x
| Id | Name |
x------x--------------------x
| 1 | A |
| 2 | B |
| 3 | C |
x------x--------------------x
DETAILS TABLE
x------x--------------------x-------x
| Id | PERIOD | QTY |
x------x--------------------x-------x
| 1 | 2014-01-13 | 10 |
| 1 | 2014-01-11 | 15 |
| 1 | 2014-01-12 | 20 |
| 2 | 2014-01-06 | 30 |
| 2 | 2014-01-08 | 40 |
x------x--------------------x-------x
There are many situations where we need to replace INNER JOIN
with CROSS APPLY
.
1. Join two tables based on TOP n
results
Consider if we need to select Id
and Name
from Master
and last two dates for each Id
from Details table
.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
INNER JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID
The above query generates the following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
x------x---------x--------------x-------x
See, it generated results for last two dates with last two date's Id
and then joined these records only in the outer query on Id
, which is wrong. This should be returning both Ids
1 and 2 but it returned only 1 because 1 has the last two dates. To accomplish this, we need to use CROSS APPLY
.
SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
CROSS APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D
and forms the following result.
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
x------x---------x--------------x-------x
Here's how it works. The query inside CROSS APPLY
can reference the outer table, where INNER JOIN
cannot do this (it throws compile error). When finding the last two dates, joining is done inside CROSS APPLY
i.e., WHERE M.ID=D.ID
.
2. When we need INNER JOIN
functionality using functions.
CROSS APPLY
can be used as a replacement with INNER JOIN
when we need to get result from Master
table and a function
.
SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
CROSS APPLY dbo.FnGetQty(M.ID) C
And here is the function
CREATE FUNCTION FnGetQty
(
@Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=@Id
)
which generated the following result
x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
x------x---------x--------------x-------x
ADDITIONAL ADVANTAGE OF CROSS APPLY
APPLY
can be used as a replacement for UNPIVOT
. Either CROSS APPLY
or OUTER APPLY
can be used here, which are interchangeable.
Consider you have the below table(named MYTABLE
).
x------x-------------x--------------x
| Id | FROMDATE | TODATE |
x------x-------------x--------------x
| 1 | 2014-01-11 | 2014-01-13 |
| 1 | 2014-02-23 | 2014-02-27 |
| 2 | 2014-05-06 | 2014-05-30 |
| 3 | NULL | NULL |
x------x-------------x--------------x
The query is below.
SELECT DISTINCT ID,DATES
FROM MYTABLE
CROSS APPLY(VALUES (FROMDATE),(TODATE))
COLUMNNAMES(DATES)
which brings you the result
x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
| 3 | NULL |
x------x-------------x
Solution 3
cross apply
sometimes enables you to do things that you cannot do with inner join
.
Example (a syntax error):
select F.* from sys.objects O
inner join dbo.myTableFun(O.name) F
on F.schema_id= O.schema_id
This is a syntax error, because, when used with inner join
, table functions can only take variables or constants as parameters. (I.e., the table function parameter cannot depend on another table's column.)
However:
select F.* from sys.objects O
cross apply ( select * from dbo.myTableFun(O.name) ) F
where F.schema_id= O.schema_id
This is legal.
Edit: Or alternatively, shorter syntax: (by ErikE)
select F.* from sys.objects O
cross apply dbo.myTableFun(O.name) F
where F.schema_id= O.schema_id
Edit:
Note: Informix 12.10 xC2+ has Lateral Derived Tables and Postgresql (9.3+) has Lateral Subqueries which can be used to a similar effect.
Solution 4
It seems to me that CROSS APPLY can fill a certain gap when working with calculated fields in complex/nested queries, and make them simpler and more readable.
Simple example: you have a DoB and you want to present multiple age-related fields that will also rely on other data sources (such as employment), like Age, AgeGroup, AgeAtHiring, MinimumRetirementDate, etc. for use in your end-user application (Excel PivotTables, for example).
Options are limited and rarely elegant:
JOIN subqueries cannot introduce new values in the dataset based on data in the parent query (it must stand on its own).
UDFs are neat, but slow as they tend to prevent parallel operations. And being a separate entity can be a good (less code) or a bad (where is the code) thing.
Junction tables. Sometimes they can work, but soon enough you're joining subqueries with tons of UNIONs. Big mess.
Create yet another single-purpose view, assuming your calculations don't require data obtained mid-way through your main query.
Intermediary tables. Yes... that usually works, and often a good option as they can be indexed and fast, but performance can also drop due to to UPDATE statements not being parallel and not allowing to cascade formulas (reuse results) to update several fields within the same statement. And sometimes you'd just prefer to do things in one pass.
Nesting queries. Yes at any point you can put parenthesis on your entire query and use it as a subquery upon which you can manipulate source data and calculated fields alike. But you can only do this so much before it gets ugly. Very ugly.
Repeating code. What is the greatest value of 3 long (CASE...ELSE...END) statements? That's gonna be readable!
- Tell your clients to calculate the damn things themselves.
Did I miss something? Probably, so feel free to comment. But hey, CROSS APPLY is like a godsend in such situations: you just add a simple CROSS APPLY (select tbl.value + 1 as someFormula) as crossTbl
and voilà! Your new field is now ready for use practically like it had always been there in your source data.
Values introduced through CROSS APPLY can...
- be used to create one or multiple calculated fields without adding performance, complexity or readability issues to the mix
- like with JOINs, several subsequent CROSS APPLY statements can refer to themselves:
CROSS APPLY (select crossTbl.someFormula + 1 as someMoreFormula) as crossTbl2
- you can use values introduced by a CROSS APPLY in subsequent JOIN conditions
- As a bonus, there's the Table-valued function aspect
Dang, there's nothing they can't do!
Solution 5
This has already been answered very well technically, but let me give a concrete example of how it's extremely useful:
Lets say you have two tables, Customer and Order. Customers have many Orders.
I want to create a view that gives me details about customers, and the most recent order they've made. With just JOINS, this would require some self-joins and aggregation which isn't pretty. But with Cross Apply, its super easy:
SELECT *
FROM Customer
CROSS APPLY (
SELECT TOP 1 *
FROM Order
WHERE Order.CustomerId = Customer.CustomerId
ORDER BY OrderDate DESC
) T
Comments
-
Jeff Meatball Yang over 2 years
What is the main purpose of using CROSS APPLY?
I have read (vaguely, through posts on the Internet) that
cross apply
can be more efficient when selecting over large data sets if you are partitioning. (Paging comes to mind)I also know that
CROSS APPLY
doesn't require a UDF as the right-table.In most
INNER JOIN
queries (one-to-many relationships), I could rewrite them to useCROSS APPLY
, but they always give me equivalent execution plans.Can anyone give me a good example of when
CROSS APPLY
makes a difference in those cases whereINNER JOIN
will work as well?
Edit:
Here's a trivial example, where the execution plans are exactly the same. (Show me one where they differ and where
cross apply
is faster/more efficient)create table Company ( companyId int identity(1,1) , companyName varchar(100) , zipcode varchar(10) , constraint PK_Company primary key (companyId) ) GO create table Person ( personId int identity(1,1) , personName varchar(100) , companyId int , constraint FK_Person_CompanyId foreign key (companyId) references dbo.Company(companyId) , constraint PK_Person primary key (personId) ) GO insert Company select 'ABC Company', '19808' union select 'XYZ Company', '08534' union select '123 Company', '10016' insert Person select 'Alan', 1 union select 'Bobby', 1 union select 'Chris', 1 union select 'Xavier', 2 union select 'Yoshi', 2 union select 'Zambrano', 2 union select 'Player 1', 3 union select 'Player 2', 3 union select 'Player 3', 3 /* using CROSS APPLY */ select * from Person p cross apply ( select * from Company c where p.companyid = c.companyId ) Czip /* the equivalent query using INNER JOIN */ select * from Person p inner join Company c on p.companyid = c.companyId
-
Rire1979 over 13 yearsI know this is EVEN PICKIER of me but 'performant' is most definitely a word. It is just not related to efficiency.
-
ARZ about 12 yearsIt's very useful for sql xquery. check this.
-
crokusek about 12 yearsSeems like using "inner loop join" would be very close to cross apply. I wish your example detailed which join hint was equivalent. Just saying join might result in inner/loop/merge or even "other" because it may re-arrange with other joins.
-
paparazzo almost 12 yearsWhen the join will create a lot of rows but you only need to evaluate one row join at a time. I had a case were I needed a self join on a table with over 100 million rows and there was simple not enough memory. So I went cursor to bring memory footprint down. From cursor I went cross apply as still managed memory footprint and was 1/3 faster than cursor.
-
MikeTeeVee over 11 yearsIn my experience Cross-Apply is faster in most cases (like for reporting), but when exporting records (tens of thousands or more), it will loose out to a hash-join on a derived query. YMMV
-
Gerardo Lima over 10 years
CROSS APPLY
has its obvious usage in allowing a set to depend on another (unlike theJOIN
operator), but that doesn't comes without a cost: it behaves like a function that operates over each member of the left set, so, in SQL Server terms it always perform aLoop Join
, which almost never is the best way to join sets. So, useAPPLY
when you need to, but don't overuse it againstJOIN
. -
Pavan Josyula over 7 yearsWith a JOIN operator, both inputs represent static relations. With APPLY, the left side is a static relation, but the right side can be a table expression with correlations to elements from the left table- Referred from 70-461 Traning Kit
-
Matt H almost 7 yearsProvided link "doesn't require a UDF as the right-table." is no longer working.
-
jezza101 about 2 years@Rire1979 From the dictionary: Performant uk/pəˈfɔː.mənt/ us/pɚˈfɔːr.mənt/ ADJECTIVE (of technology, etc.) working in an effective way We found it the most flexible and performant monitoring software.
-
-
Jeff Meatball Yang almost 15 yearsSee the end of Ariel's link. A row_number() query is just as nice and doesn't even require a join. So I don't think I should use cross apply for this situation (select top 3, partition by t1.id).
-
Jeff Meatball Yang almost 15 yearsGood example! The raw performance increase is very apparent. How do the execution plans differ?
-
Quassnoi almost 15 years@Jeff: they differ much, as you can see :)
JOIN
usesNESTED LOOPS
withmaster
as a leading table,CROSS APPLY
usesNESTED LOOPS
too butt
is leading andTOP
is applied tomaster
on each loop. -
MikeKulls almost 13 yearsI think this is the reasoning behind why we have cross apply. If you check out the link below this is the first thing MS says about cross apply. It might have other uses but I think this is the reason it was introduced. Without it table functions would not be usable in a lot of situations. technet.microsoft.com/en-us/library/ms175156.aspx
-
MikeKulls almost 13 yearsAlthough this is the most popular answer I don't think it answers the actual question "What is the main purpose of using CROSS APPLY?". The main purpose is to enable table functions with parameters to be executed once per row and then joined to the results.
-
Quassnoi almost 13 years@Mike: how do you call a
TVF
withINNER JOIN
? -
nurettin over 12 yearscross apply also produces a nice execution plan when coupled with inline table functions while maintaining much needed modularity.
-
kmote about 12 years@Quassnoi - the link to your blog post appears down. The post is archived here: web.archive.org/web/20101225210315/http://explainextended.com/…
-
ErikE over 11 yearsNo
SELECT
needed inside theCROSS APPLY
. Please tryCROSS APPLY dbo.myTableFun(O.name) F
. -
nurettin about 11 years@ErikE sure, you can always use the less flexible syntax to cross apply. I was showing the more generalized version which you can can sometimes use in order to avoid bringing hard to compute columns into the query.
-
ErikE about 11 years@MikeKulls Yes, but the OP didn't ask for the main purpose of using
CROSS APPLY
, he asked for when to choose it overINNER JOIN
, when that would work as well. -
geoffrobinson almost 11 yearsSo it is essentially a CROSS JOIN which allows the left-handed table to define the right side as well. Would that be a somewhat accurate way to describe it?
-
a_horse_with_no_name over 9 yearsIt might be worth mentioning that this is called a
lateral join
in standard (ANSI) SQL -
Bolu about 9 yearsI think the point is:
inner join ( select * from dbo.myTableFun(O.name) ) F
is legal too. So if you want to emphasis the differences, you'd better use the shorter syntax. -
nurettin about 9 years@Bolu inner join won't work if table function parameter depends on another table's column (aka external reference) in the external select. It will work if table function parameter is a literal or a variable. Cross apply will work in both cases.
-
mrmillsy over 8 yearsThis is a big +1 from me, as I'm surprised it's not mentioned more often. Perhaps you could extend this example to show how you can perform "procedural" calculations on the chain of derived values? Eg: CROSS APPLY (select crossTbl.value * tbl.multiplier as Multiplied) multiTbl - CROSS APPLY (select multiTbl.Multiplied / tbl.DerivativeRatio as Derived) derivedTbl - etc ...
-
przemo_li almost 8 yearsAny more info/examples on how to use Cross Apply as replacement for CASE..ELSE..END ?
-
mtone almost 8 years@przemo_li APPLY can be used to store the result of a case statement (among other things) in order to refer to it. A structure could be something like: SELECT CASE when subquery.intermediateResult > 0 THEN "yes" ELSE "no" END FROM someTable OUTER APPLY (select CASE...END...ELSE as intermediateResult) as subquery.
-
trnelson over 7 yearsExcellent example with the 2 vs 4 records and helped me understand the context in which this would be needed.
-
Matt Arnold over 5 yearsDoesn't
TOP(t.id)
require all ids in t to be contiguous to perform efficiently? Also, even if this is the case; if you have 1,000,000 records and you need to get record #1,000,000, won't thisCROSS APPLY
have to first get 999,999 records before it finds it? -
Quassnoi over 5 years@MattArnold: I'm not really sure what "efficiently" means in this context. This is a made-up example to illustrate the principle, in real world you don't compare ids to row numbers. If you want to get the "record number 1000000", you first have to define what does "record number 1000000" mean. If it means "entry number 1 million in a rowset sorted by such and such fields", then yes, it would need to count them off, one by one, before it can point to the millionth one.
-
Matt Arnold over 5 yearsWhat I meant by efficiently is that if you need to get the 3rd record but the IDs are randomly generated across the entire INT range then it might be calling
TOP(2,147,483,647)
just to get record 3 just because it had an ID of 2,147,483,647 instead of 3. My confusion is whyTOP
andORDER BY
are even being used in this example - what would be wrong with usingWHERE
instead (e.g.WHERE id = t.id
)? -
Quassnoi over 5 years@MattArnold: this is not a query you would see in the real world. This is a query which illustrates the performance difference between
ROW_NUMBER
andCROSS APPLY / TOP
. UsingWHERE
instead ofTOP
would not let me illustrate this difference. -
Joseph Cho over 5 yearsFor point 1 where we have 2 rows for ID 1 instead of 4 rows for ID 1, 2. Wouldn't we just use a left join instead.
-
Vérace almost 5 yearsWhere did you get the
master
table that you describe using? I would like to run tests on !m -
Vérace almost 5 yearsWhere did you get the
master
table that you describe using? I would like to run tests on my own machine. -
Quassnoi almost 5 years@Vérace open the linked article of mine explainextended.com/2009/07/16/inner-join-vs-cross-apply , there are scripts to create the test tables and the queries against those tables
-
alpav almost 4 yearsExample of calculations with cross apply col.sql.drylib.com
-
Vlad about 3 years@Quassnoi thank you very much for your answer ! I learned a new thing today (cross apply) and saved me from having to do some very nasty inner joins with RANKS, precisely because I can have multiple matches in my inner join..... I can now simply CROSS APPLY (SELECT TOP 1...)
-
Greg Gum over 2 yearsOne year later, here I am back reading the same question, and finding my own answer which I frankly don't remember writing!