INNER JOIN ON vs WHERE clause

669,227

Solution 1

INNER JOIN is ANSI syntax that you should use.

It is generally considered more readable, especially when you join lots of tables.

It can also be easily replaced with an OUTER JOIN whenever a need arises.

The WHERE syntax is more relational model oriented.

A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.

It's easier to see this with the WHERE syntax.

As for your example, in MySQL (and in SQL generally) these two queries are synonyms.

Also, note that MySQL also has a STRAIGHT_JOIN clause.

Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.

You cannot control this in MySQL using WHERE syntax.

Solution 2

Others have pointed out that INNER JOIN helps human readability, and that's a top priority, I agree.
Let me try to explain why the join syntax is more readable.

A basic SELECT query is this:

SELECT stuff
FROM tables
WHERE conditions

The SELECT clause tells us what we're getting back; the FROM clause tells us where we're getting it from, and the WHERE clause tells us which ones we're getting.

JOIN is a statement about the tables, how they are bound together (conceptually, actually, into a single table).

Any query elements that control the tables - where we're getting stuff from - semantically belong to the FROM clause (and of course, that's where JOIN elements go). Putting joining-elements into the WHERE clause conflates the which and the where-from, that's why the JOIN syntax is preferred.

Solution 3

Applying conditional statements in ON / WHERE

Here I have explained the logical query processing steps.


Reference: Inside Microsoft® SQL Server™ 2005 T-SQL Querying
Publisher: Microsoft Press
Pub Date: March 07, 2006
Print ISBN-10: 0-7356-2313-9
Print ISBN-13: 978-0-7356-2313-2
Pages: 640

Inside Microsoft® SQL Server™ 2005 T-SQL Querying

(8)  SELECT (9) DISTINCT (11) TOP <top_specification> <select_list>
(1)  FROM <left_table>
(3)       <join_type> JOIN <right_table>
(2)       ON <join_condition>
(4)  WHERE <where_condition>
(5)  GROUP BY <group_by_list>
(6)  WITH {CUBE | ROLLUP}
(7)  HAVING <having_condition>
(10) ORDER BY <order_by_list>

The first noticeable aspect of SQL that is different than other programming languages is the order in which the code is processed. In most programming languages, the code is processed in the order in which it is written. In SQL, the first clause that is processed is the FROM clause, while the SELECT clause, which appears first, is processed almost last.

Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller. If a certain clause is not specified in a query, the corresponding step is simply skipped.

Brief Description of Logical Query Processing Phases

Don't worry too much if the description of the steps doesn't seem to make much sense for now. These are provided as a reference. Sections that come after the scenario example will cover the steps in much more detail.

  1. FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.

  2. ON: The ON filter is applied to VT1. Only rows for which the <join_condition> is TRUE are inserted to VT2.

  3. OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.

  4. WHERE: The WHERE filter is applied to VT3. Only rows for which the <where_condition> is TRUE are inserted to VT4.

  5. GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.

  6. CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.

  7. HAVING: The HAVING filter is applied to VT6. Only groups for which the <having_condition> is TRUE are inserted to VT7.

  8. SELECT: The SELECT list is processed, generating VT8.

  9. DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.

  10. ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).

  11. TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.



Therefore, (INNER JOIN) ON will filter the data (the data count of VT will be reduced here itself) before applying the WHERE clause. The subsequent join conditions will be executed with filtered data which improves performance. After that, only the WHERE condition will apply filter conditions.

(Applying conditional statements in ON / WHERE will not make much difference in few cases. This depends on how many tables you have joined and the number of rows available in each join tables)

Solution 4

The implicit join ANSI syntax is older, less obvious, and not recommended.

In addition, the relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearranged by the optimizer.

I recommend you write the queries in the most readable way possible.

Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.

For example, instead of:

SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
    ON ca.CustomerID = c.CustomerID
    AND c.State = 'NY'
INNER JOIN Accounts a
    ON ca.AccountID = a.AccountID
    AND a.Status = 1

Write:

SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
    ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
    ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
    AND a.Status = 1

But it depends, of course.

Solution 5

Implicit joins (which is what your first query is known as) become much much more confusing, hard to read, and hard to maintain once you need to start adding more tables to your query. Imagine doing that same query and type of join on four or five different tables ... it's a nightmare.

Using an explicit join (your second example) is much more readable and easy to maintain.

Share:
669,227
JCCyC
Author by

JCCyC

Some fun things I did over the years: A multiuser/multitasking DOS-compatible OS... which ran on PC-XTs! A distributed inventory/orders system for a computer store chain. Online updates of MDB database content AND structure. In 1997. Access control system for an ISP. Kinda Rube Goldberg-ish. Lots of scripts that manipulate iptables to turn users to the login page, dhcpd to give thep IPs, cbq to assign them bandwidth etc. A security system for MS-DOS, implementing somewhat-Unix-style accounts and permissions. Clustering Asterisk PBXs for call centers. Autodialers with detection of human / answering machine / busy / bad number for (I'm going to Hell for this) active telemarketing. Please forgive me. A driver to integrate Skype into Asterisk. Even more Rube Goldberg-ish than the ISP login thing. Energy consumption billing through a proprietary board that "ticked" a bit in a parallel interface whenever X watts-hours had gone through path Y. Linux system-restore CDs. Satellite image acquiring through yet another proprietary expansion board, for Brazil's National Space Research Institute. REST services with Django. Remote system audits in C# using the OVAL language. For the retrocomputing fans out there: https://sourceforge.net/projects/cachars Lots more I sure forget about now. Oh, and I just learned CUDA and Greenplum. They're neat. But I didn't get to use either in any project so far. :-/ Things I like for fun: - Formula One and Indy racing. (You get brownie points if you guess what race my avatar is from.) - Reading in general, mostly nonfiction. - Retrocomputing and retrogaming. - Traveling, preferably overseas. - Sci-fi in general, and Star Trek in particular.

Updated on February 25, 2021

Comments

  • JCCyC
    JCCyC about 3 years

    For simplicity, assume all relevant fields are NOT NULL.

    You can do:

    SELECT
        table1.this, table2.that, table2.somethingelse
    FROM
        table1, table2
    WHERE
        table1.foreignkey = table2.primarykey
        AND (some other conditions)
    

    Or else:

    SELECT
        table1.this, table2.that, table2.somethingelse
    FROM
        table1 INNER JOIN table2
        ON table1.foreignkey = table2.primarykey
    WHERE
        (some other conditions)
    

    Do these two work on the same way in MySQL?

    • Alexander Malakhov
      Alexander Malakhov about 13 years
      @Marco: here it is
    • Ciro Santilli OurBigBook.com
      Ciro Santilli OurBigBook.com over 9 years
    • Mikko Rantalainen
      Mikko Rantalainen about 8 years
      If I have understood correctly, the first variant is ANSI SQL-89 implicit syntax and the second variant is ANSI SQL-92 explicit join syntax. Both will result in the same result in conforming SQL implementations and both will result in the same query plan in well done SQL implementations. I personally prefer SQL-89 syntax but many people prefer SQL-92 syntax.
    • Mikko Rantalainen
      Mikko Rantalainen about 8 years
      @Hogan I was pointing out the official names for different syntaxes. None of the answers explicitly spelled out the full names so I decided to add those as comments. However, my comment did not answer the actual question so I added that as as comment, not as an answer. (High voted answers have claims such as "INNER JOIN is ANSI syntax" and "implicit join ANSI syntax is older" which says nothing at all because both syntaxes are different ANSI syntaxes.)
  • dburges
    dburges almost 15 years
    Even in SQL Server 2000, = and = could give wrong results and should never be used.
  • Erwin Raets
    Erwin Raets almost 15 years
    I couldn't disagree more. JOIN syntax is extremely wordy and difficult to organize. I have plenty of queries joining 5, 10, even 15 tables using WHERE clause joins and they are perfectly readable. Rewriting such a query using a JOIN syntax results in a garbled mess. Which just goes to show there is no right answer to this question and that it depends more on what you're comfortable with.
  • Quassnoi
    Quassnoi almost 15 years
    @HLGEM: While I agree completely that explicit JOINs are better, there are cases when you just need to use the old syntax. A real world example: ANSI JOIN's got into Oracle only in version 9i which was released in 2001, and until only a year ago (16 years from the moment the standard was published) I had to support a bunch of 8i installation for which we had to release critical updates. I didn't want to maintain two sets of updates, so we developed and tested the updates against all databases including 8i, which meant we were unable to use ANSI JOINs.
  • matt b
    matt b almost 15 years
    Noah, I think you might be in the minority here.
  • allyourcode
    allyourcode almost 15 years
    Thanks, Quassnoi. You've got alot of details in your ans; is it fair to say that "yes, those queries are equivalent, but you should use inner join because it's more readable, and easier to modify"?
  • allyourcode
    allyourcode almost 15 years
    I get +1 to matt and Noah. I like diversity :). I can see where Noah is coming from; inner join doesn't add anything new to the language, and is definitely more verbose. On the other hand, it can make your 'where' condition much shorter, which usually means it's easier to read.
  • allyourcode
    allyourcode almost 15 years
    Thanks for clarifying why inner join is preferred Carl. I think your ans was implicit in the others, but explicit is usually better (yes, I'm a Python fan).
  • allyourcode
    allyourcode almost 15 years
    Your first snippet definitely hurts my brain more. Does anyone actually do that? If I meet someone that does that, is it ok for me to beat him over the head?
  • Cade Roux
    Cade Roux almost 15 years
    I locate the criteria where it makes the most sense. If I'm joining to a temporally consistent snapshot lookup table (and I don't have a view or UDF which enforces the selection of a valid date), I will include the effective date in the join and not in the WHERE because it's less likely to accidentally get removed.
  • Quassnoi
    Quassnoi almost 15 years
    @allyourcode: for Oracle, SQL Server, MySQL and PostgreSQL — yes. For other systems, probably, too, but you better check.
  • matt b
    matt b over 14 years
    I would assume that any sane DBMS would translate the two queries into the same execution plan; however in reality each DBMS is different and the only way to know for sure is to actually examine the execution plan (i.e., you'll have to test it yourself).
  • Bill Karwin
    Bill Karwin over 14 years
    FWIW, using commas with join conditions in the WHERE clause is also in the ANSI standard.
  • Quassnoi
    Quassnoi over 14 years
    @Bill Karwin: JOIN keyword was not a part of proprietary standards until the past more recent that it may seem. It made its way into Oracle only in version 9 and into PostgreSQL in version 7.2 (both released in 2001). Appearance of this keyword was a part of ANSI standard adoption, and that's why this keyword is usually associated with ANSI, despite the fact the latter supports comma as a synonym for CROSS JOIN as well.
  • Bill Karwin
    Bill Karwin over 14 years
    Nevertheless, ANSI SQL-89 specified joins to be done with commas and conditions in a WHERE clause (without conditions, a join is equivalent to a cross join, as you said). ANSI SQL-92 added the JOIN keyword and related syntax, but comma-style syntax is still supported for backward compatiblity.
  • Bill Karwin
    Bill Karwin over 14 years
    InterBase 4.0 is an example of a commercial RDBMS implementation that supported JOIN syntax as early as 1994.
  • Marco Demaio
    Marco Demaio over 13 years
    +1 interesting point when you point out that the sintax without INNER JOIN is more error prone. I'm confused about your last sentence when you say "...the standard using the explicit joins is 17 years old." so are you then suggesting to use the INNER JOIN keyword or not?
  • dburges
    dburges over 13 years
    @Marco Demaio, yes always use INNER JOIN or JOIN (these two are the same)or LEFT JOIN or RIGHT JOIN or CROSS JOIN and never use the implicit comma joins.
  • Dave Markle
    Dave Markle over 12 years
    @allyourcode: though it's rare to see this type of join syntax in INNER JOINs, it's quite common for RIGHT JOINs and LEFT JOINS -- specifying more detail in the join predicate eliminates the need for a subquery and prevents your outer joins from inadvertently being turned into INNER JOINs. (Though I agree that for INNER JOINs I'd almost always put c.State = 'NY' in the WHERE clause)
  • Mike Sherrill 'Cat Recall'
    Mike Sherrill 'Cat Recall' almost 11 years
    "Therefore, (INNER JOIN) ON will filter the data (The data count of VT will be reduced here itself) before applying WHERE clause." Not necessarily. The article is about the logical order of processing. When you say a particular implementation will do one thing before another thing, you're talking about the implemented order of processing. Implementations are allowed to make any optimizations they like, as long as the result is the same as if the implementation followed the logical order. Joe Celko has written a lot about this on Usenet.
  • BlackTigerX
    BlackTigerX over 10 years
    Just a note to be clear, implicit vs explicit joins are NOT the same, implicit joins will surprise you every once in a while when "nothing changed", especifically when dealing with null values (bugs in production); if you want to joins tables, do so explicitly (join... on...) and avoid your self the headache
  • Quassnoi
    Quassnoi over 10 years
    @BlackTigerX: could you please be more specific about implicit and explicit joins "being not the same"?
  • BlackTigerX
    BlackTigerX over 10 years
    Again, it's when working with null values, usually with joins on multiple columns, and some of those values are null, you will get different results (e.g.: empty vs not empty) depending on which join you use
  • Arvind Sridharan
    Arvind Sridharan over 10 years
    is there a performance gain by using join on instead of using WHERE?
  • Arth
    Arth almost 10 years
    @allyourcode I definitely do that! And I agree with Cade.. I'm curious as to whether there is a decent reason not to
  • philipxy
    philipxy over 8 years
    Implicit join (,) is exactly the same as CROSS JOIN and (INNER) JOIN ON 1=1 except it has lower precedence. (OUTER) LEFT/RIGHT/FULL JOINs differ from (INNER) JOIN (they can add extra rows with NULLs). You do have to use ON in JOINs before OUTER JOINs if you don't use a subselect with WHERE instead. (@BlackTigerX is wrong.)
  • philipxy
    philipxy over 8 years
    The semantics of ON and WHERE mean that for JOINs after the last OUTER JOIN it doesn't matter which you use. Although you characterize ON as part of the JOIN, it is also a filtering after a Cartesian product. Both ON and WHERE filter a Cartesian product. But either ON or a subselect with WHERE must be used before the last OUTER JOIN. (JOINs aren't "on" column pairs. Any two tables can be JOINed ON any condition. That's just a way to interpret JOINs ON equality of columns specifically.)
  • philipxy
    philipxy over 8 years
    *= and =* were never ANSI and were never a good notation. That's why ON was needed--for OUTER JOINs in the absence of subselects (which got added at the same time, so they aren't actually needed in CROSS & INNER JOINs.)
  • onedaywhen
    onedaywhen over 7 years
    "Why do you want to write database code that is [20 years old]?" - I notice you write SQL using HAVING which has been 'outdated' since SQL started supporting derived tables. I also notice you don't use NATURAL JOIN even though I would argue it has made INNER JOIN 'outdated'. Yes, you have your reasons (no need to state them again here!): my point is, those who like using the older syntax have their reasons too and the relative age of the syntax is of little if any relevance.
  • dburges
    dburges over 7 years
    Natural Joins are not supported in the database I use. I don't understand why HAVING is replaced by derived tables, please explain, I really would be interested. However, I have never seen anyone who uses implicit joins give a reason to use them that isn't,"It is what I am used to." If you have one, I would like to hear it. There is no technical advantage that I know of for an implicit join over an inner join. There are technical disadvantages because implicit joins can lead to bad result sets due to unidentified accidental cross joins which are not possible using the inner join syntax.
  • dburges
    dburges over 7 years
    @onedaywhen, It is especially critical that people new to SQL avoid implicit joins, they are the very people who get wrong results because they don't understand join concepts. And you are still writing the join conditions in the where clause so you save nothing except the time it takes to write the word join. Given that you write the conditions later in the query, it probably actually takes less time thanjumping up and down in the query to add the where clause as you add the join. The I'm too tired argument is silly. You can write joins as easily or more easily than an implicit join.
  • onedaywhen
    onedaywhen over 7 years
    Wasn't looking for an argument, genuinely though you wanted info from me. But I maintain that "outdated for 20 years" is a poor defense and stand by my original comments.
  • Jürgen A. Erhard
    Jürgen A. Erhard about 7 years
    WHERE is still in the standard (show me where it's not). So, nothing outdated, apparently. Also, "rather than fixing the join" shows me a developer who should be kept away from DBMSs in general, far away.
  • PlexQ
    PlexQ almost 7 years
    These are NOT synonyms in MySQL, MySQL's optimizer will pick your join order if you supply explicitly versus optimized order for earlier MySQL version
  • Quassnoi
    Quassnoi almost 7 years
    @plexq: are you saying the two queries from the op would yield different plans?
  • James
    James over 6 years
    @rafidheen "(INNER JOIN) ON will filter the data ... before applying WHERE clause ... which improves performance." Good point. "After that only the WHERE condition will apply filter conditions" What about the HAVING clause?
  • James
    James over 6 years
    Is it true as @rafidheen suggested in another answer (the one with the detailed sequence of SQL execution) that JOINs are filtered one at a time, reducing the size of join opertations when compared to a full cartesian join of 3 or more tables, with the WHERE filter being applied retroactively? If so, it would suggest JOIN offers performance improvement (as well as advantages in left/right joins, as also pointed out on another answer).
  • philipxy
    philipxy about 6 years
    Standard SQL didn't change. MySQL was just wrong & now is right. See the MySQL manual.
  • Tom
    Tom about 6 years
    onedaywhen: "relative age of the syntax" is of "relevance". Usu., a newer syntax / feature is meant to be more efficient and/or readable (in this case mainly the latter but also, as @HLGEM mentioned, even the former as far as typing due less jumping around), than older syntax that's functionally equivalent.
  • Tom
    Tom about 6 years
    @Jürgen A. Erhard: The fact that a syntax (/ feature) "is still in the standard" (/supported) is a poor defense for using it. There are countless examples of where an older syntax (/ feature) "is still in the standard" (/ supported) mainly, if not only, for backwards compatibility. Plus, HLGEM never said that the WHERE form is no longer in the standard, merely that the JOIN form (mostly likely and by most accounts meant to be "'better'") has been around for plenty of time to significantly reduce, if not, eliminate reasons (i.e. to work with legacy code) to use the older form.
  • philipxy
    philipxy almost 6 years
    @James That claim by rafidheen is wrong. See 'join optimization' in the manual. Also my other comments on this page. (And MikeSherrill'CatRecall''s.) Such "logical" descriptions describe the result value, not how it is actually calculated. And such implementation behaviour is not guaranteed to not change.
  • philipxy
    philipxy almost 6 years
    @James No, it's not true. Read about optimization in the manual & read textbooks re logical & physical relational query optimization.
  • cybergeek654
    cybergeek654 almost 5 years
    Even when you are using WHERE to the same effect of INNER JOIN, you are going to mention your two tables in the FROM part of the query. So basically you are still implying where you are getting your data in the FROM clause, so I guess you cannot say it necessarily "conflates the which and the where-from"
  • philipxy
    philipxy about 4 years
    @ArsenKhachaturyan Just because a keyword or identifier is used in text doesn't mean it is code & needs code format. That is a formatting choice that could go any way & if it is reasonable to edit here then it is justifiable for every post to be constantly edited to the other format--which is to say, it's not justifiable. (Plus inline per-word code format can be difficult to read.) Same for the paragraph breaks here--they aren't particularly clarify. Same with 'which' vs 'that'. And names of programming languages should not be in code format. PS You added a line break in error.
  • Arsen Khachaturyan
    Arsen Khachaturyan about 4 years
    @philipxy as you mentioned "it doesn't mean...", but obviously neither that meant that it can't be marked with code keyword. Yes it's choice to be made but a lot of posts are done without knowing that fact. Hence my decision to make the changes are not intended to break anything but make it more readable. If you noticed any break after formating changes, sorry for that, and you obviously can revert such changes.
  • philipxy
    philipxy about 4 years
    @ArsenKhachaturyan It means it should not be changed, as I explained, because otherwise endless pointless edits would be justified, and you are making endless pointless edits, merely changing to one style over another, and not the author's chosen style at that.
  • Arsen Khachaturyan
    Arsen Khachaturyan about 4 years
    @philipxy I think we are not getting to each other, your observations around that there is no meaning of edit is a matter of choice topic and I don't think we should have an endless discussion around it. Like I said before whatever initially is chosen by the author doesn't necessarily mean it is the best version and if the author disagrees with the change StackOverflow always allows to revert it. So please lets' stop here and don't go further with this discussion.
  • philipxy
    philipxy over 3 years
    This doesn't answer the question. Also implicit join is comma, as in the 1st code block, and what you are suggesting. And the code you suggest is already in the question. Also neither code block is any more or less declarative or procedural than the other.