How does Left Join / IS NULL eliminate records which are there in one table and not in the other?

12,508

Solution 1

This could be explained with the following

mysql> select * from table1 ;
+------+------+
| id   | val  |
+------+------+
|    1 |   10 |
|    2 |   30 |
|    3 |   40 |
+------+------+
3 rows in set (0.00 sec)

mysql> select * from table2 ;
+------+------+
| id   | t1id |
+------+------+
|    1 |    1 |
|    2 |    2 |
+------+------+
2 rows in set (0.00 sec)

Here table1.id <-> table2.t1id

Now when we do a left join with the joining key and if the left table is table1 then it will get all the data from table1 and in non-matching record on table2 will be set to null

mysql> select t1.* , t2.t1id from table1 t1 
left join table2 t2 on t2.t1id = t1.id ;
+------+------+------+
| id   | val  | t1id |
+------+------+------+
|    1 |   10 |    1 |
|    2 |   30 |    2 |
|    3 |   40 | NULL |
+------+------+------+

3 rows in set (0.00 sec)

See that table1.id = 3 does not have a value in table2 so its set as null When you apply the where condition it will do further filtering

mysql> select t1.* , t2.t1id from table1 t1 
left join table2 t2 on t2.t1id = t1.id where t2.t1id is null;
+------+------+------+
| id   | val  | t1id |
+------+------+------+
|    3 |   40 | NULL |
+------+------+------+
1 row in set (0.00 sec)

Solution 2

Let's assume r-table is employees, and r_table is computers. Some employees don't have computers. Some computers are not assigned to anyone yet.

  1. Inner join:

    SELECT  l.*, r.*
    FROM    employees l
    JOIN computers r
    ON      r.id = l.comp_id
    

    gives you the list of all employees who HAVE a computer, and the info about computer assigned for each of them. The employees without a computer will NOT appear on this list.

  2. Left join:

    SELECT  l.*, r.*
    FROM    employees l
    LEFT JOIN computers r
    ON      r.id = l.comp_id
    

    gives you the list of ALL employees. The employees with computer will show the computer info. The employees without computers will appear with NULLs instead of computer info.

  3. Finally

    SELECT  l.*, r.*
    FROM    employees l
    LEFT JOIN computers r
    ON      r.id = l.comp_id
    WHERE   r.id IS NULL
    

    Left join with the WHERE clause will start with the same list as the left join (2), but then it will keep only those employees that do not have corresponding information in the computer table, that is, employees without the computers.

    I this case, selecting anything from the r table will be just nulls, so you can leave those fields out and select only stuff from the l table:

    SELECT  l.*
    FROM ...
    

Try this sequence of selects and observe the output. Each next step builds on the previous one.

Please let me know if this explanation is understandable, or you'd like me to elaborate some more.

EDITED TO ADD: Here's sample code to create the two tables used above:

CREATE TABLE employees
(  id INT NOT NULL PRIMARY KEY,
   name VARCHAR(20),
   comp_id INT);

INSERT INTO employees (id, name, comp_id) VALUES (1, 'Becky', 1);
INSERT INTO employees (id, name, comp_id) VALUES (2, 'Anne', 7);
INSERT INTO employees (id, name, comp_id) VALUES (3, 'John', 3);
INSERT INTO employees (id, name) VALUES (4, 'Bob');

CREATE TABLE computers
(  id INT NOT NULL PRIMARY KEY,
   os VARCHAR(20) );

INSERT INTO computers (id, os) VALUES (1,'Windows 7');
INSERT INTO computers (id, os) VALUES (2,'Windows XP');
INSERT INTO computers (id, os) VALUES (3,'Unix');
INSERT INTO computers (id, os) VALUES (4,'Windows 7');

There are 4 employees. Becky and John have computers. Anne and Bob do not have a computer. (Anne, has a comp_id 7, which doesn't correspond to any row in computers table - so, she doesn't really have a computer.)

Solution 3

the left returns all the row from the left table

ON r.value = l.value
if there is no r.value for a l.value then the r is empty
r.value is null will be true

Solution 4

This is based on the difference between INNER JOIN and LEFT JOIN. When you use an INNER JOIN, the result only contains rows that match between the two tables (based on the ON condition).

But when you use a LEFT JOIN, you also get rows in the result for all rows in the first table that don't have a match in the second table. In these cases, all the result columns from the second table are filled with NULL as a placeholder.

The WHERE r.value IS NULL test then matches these rows. So the final result only contains the rows with no match.

Solution 5

 SELECT * FROM a LEFT JOIN b ON a.v = b.v ... WHERE b.id IS NULL

it's super simple logic:

  • select from table "a"
  • join table "b" (if matching rows exists - ON a.v=b.v otherwise use NULL's as values)
  • WHERE condition: if something is missing (b.id IS NULL) = OK.
Share:
12,508

Related videos on Youtube

StrugglingCoder
Author by

StrugglingCoder

Updated on July 13, 2022

Comments

  • StrugglingCoder
    StrugglingCoder almost 2 years

    I am having a tough time to understand why does LEFT JOIN / IS NULL eliminate records which are there in one table and not in the other. Here is an example

    SELECT  l.id, l.value
    FROM    t_left l
    LEFT JOIN t_right r
    ON      r.value = l.value
    WHERE   r.value IS NULL
    

    Why should r.value = NULL eliminate records ? I am not understanding . I know I am missing something very basic but at present I cant figure out even that basic one. I would appreciate if someone explains it to me in detail .

    I want a very basic explanation.

    • Strawberry
      Strawberry about 9 years
      Just run this: SELECT * FROM t_left l LEFT JOIN t_right r ON r.value = l.value. Now it should be obvious!
  • pala_
    pala_ about 9 years
    this doesn't explain why there are nulls in the first place
  • Peter
    Peter about 9 years
    @pala_ how so? you select from A
  • pala_
    pala_ about 9 years
    yes, but from the question it seems some of the problem is understanding that nulls come from rows that dont match, because its a left join - as opposed to inner joining
  • paparazzo
    paparazzo about 9 years
    This statement is flat wrong "join table "b" (if matching rows exists - ON a.v=b.v)". You are describing an inner join.