SQL using NOT EXISTS

13,976

Solution 1

You have to relate your not exists subquery to the outer query. For example:

select  clients.studentemail 
from    clients c
join    invoices i
on      c.clientid = i.clientid 
where   invoices.dateposted > "2013-04-01" 
        and not exists 
        (
        select  * 
        from    appointments a
        where   c.clientid = a.clientid -- Relates outer to inner query
                and a.servicedirection = "delivery" 
                and a.date > '2013-07-01')
        )

Solution 2

I'm not sure what resultset you are trying to return. But including the clients table in the subquery doesn't look right.

What we usually want is a correlated subquery. For example:

SELECT c.studentEmail 
  FROM `clients` c
  JOIN `invoices` i
    ON i.clientId = c.clientId
 WHERE i.datePosted > '2013-04-01' 
   AND NOT EXISTS 
       ( SELECT 1
           FROM appointments a
          WHERE a.clientId = c.clientId 
            AND a.serviceDirection = "Delivery" 
            AND a.date > '2013-07-01'
       )

Note that the NOT EXISTS subquery references c.clientId, which is the value from the clientId column of the clients table in the outer query.

We call this a "correlated subquery", because for each row returned by the outer query, we are (effectively) running the subquery, and using the clientId from that row in the predicate (WHERE clause) of the subquery.

The NOT EXISTS returns either a TRUE (if NO matching row is found) or FALSE (if at least one matching row IS found).

In terms of performance, this type of query can be expensive for large sets, because MySQL is effectively running a separate subquery for each row returned in the outer query. An anti-join pattern is usually (not always) more efficient (with suitable indexes available).

Another way to obtain an equivalent result, using the anti-join pattern:

SELECT c.studentEmail 
  FROM `clients` c
  JOIN `invoices` i
    ON i.clientId = c.clientId
  LEFT
  JOIN appointments a
    ON a.clientId = c.clientId 
   AND a.serviceDirection = "Delivery" 
   AND a.date > '2013-07-01'.
 WHERE i.datePosted > '2013-04-01' 
   AND a.clientId IS NULL

We use a LEFT JOIN to the appointments table, to find matching rows. Note that all of the predicates to find matching rows need to be in the ON clause (rather than the WHERE clause).

That returns matching rows, as well as rows that don't have a matching row in appointments.

The "trick" now is to include a predicate in the WHERE clause, that checks for a.clientID IS NULL. That will exclude all the rows that had at least one matching appointment, so we are left with rows that don't have a match. We can reference any column in appointments that is guranteed to be NOT NULL. (We usually have an id column that is PRIMARY KEY (and therefore NOT NULL.) But we can also use the clientID column, in this case, because every matching row is guaranteed to be not null, because it had to be equal to the clientId from the clients table, and a NULL value is never "equal" to any other value. (It's the equality condition in the JOIN predicate that guarantees us (in the query) that a.clientId is not null.

This pattern is called an "anti-join".

Share:
13,976
radleybobins
Author by

radleybobins

Updated on June 14, 2022

Comments

  • radleybobins
    radleybobins almost 2 years

    I am trying to write an SQL query that returns all student email addresses for clients who have had a new invoice since April 1 and have not yet scheduled a delivery for this fall. This is returning an empty set even though I know there are entries that meet these conditions. I've tried a few different things with no luck, is there a way to do this?

    SELECT clients.studentEmail 
    FROM `clients`, `invoices` 
    WHERE clients.clientId = invoices.clientId 
    AND invoices.datePosted > "2013-04-01" 
    AND NOT EXISTS 
        (SELECT * 
        FROM appointments, clients
        WHERE clients.clientId = appointments.clientId 
        AND appointments.serviceDirection = "Delivery" 
        AND appointments.date > '2013-07-01')