How do I get the oldest date without using a correlated subquery in SQL?

14,491

Solution 1

We'll use ROW_NUMBER() to "rank" the invoices by date per vendor, and then select only the oldest per vendor:

SELECT vendor_name, invoice_number AS oldest_invoice, invoice_date, invoice_total
  FROM vendors v
 INNER JOIN (SELECT invoices.*,
                    ROW_NUMBER() OVER (PARTITION BY vendor_id ORDER BY invoice_date ASC)
                      AS rn
               FROM invoices) i
       ON i.vendor_id = v.vendor_id
          AND
          i.rn = 1;

Solution 2

Time for TDQD — Test-Driven Query Design

The minimum date for an invoice for each vendor is given by:

SELECT vendor_id, MIN(invoice_date) AS invoice_date
  FROM invoices
 GROUP BY vendor_id

The corresponding minimum invoice number (given that there could have been several invoices sent on the first day that a vendor was invoiced, if invoice_date is a true DATE with no time component; if the DATE includes a time component, the second MIN() is probably unnecessary), is:

SELECT vendor_id, MIN(invoice_number) AS invoice_number
  FROM invoices AS i
  JOIN (SELECT vendor_id, MIN(invoice_date) AS invoice_date
          FROM invoices
         GROUP BY vendor_id
       ) AS j ON j.vendor_id = i.vendor_id AND j.invoice_date = i.invoice_date
 GROUP BY vendor_id

You can join this expression with other tables to suit your query requirements:

SELECT v.*, i.*
  FROM vendors AS v
  JOIN (SELECT vendor_id, MIN(invoice_number) AS invoice_number
          FROM invoices AS i
          JOIN (SELECT vendor_id, MIN(invoice_date) AS invoice_date
                  FROM invoices
                 GROUP BY vendor_id
               ) AS j ON j.vendor_id = i.vendor_id AND j.invoice_date = i.invoice_date
         GROUP BY vendor_id
       ) AS inv_info ON v.vendor_id = inv_info.vendor_id
  JOIN invoices AS i ON i.invoice_number = inv_info.invoice_number

There are undoubtedly other ways to design it. Note that none of these sub-queries are correlated sub-queries.

The TDQD has been purely nominal; no DBMS was troubled with checking whether these queries are syntactically valid, much less return the correct answer. OTOH, it is a standand technique.

If you like listing lots of columns in GROUP BY clauses, you could do without the final join to invoices by having the inv_info sub-query return the relevant invoice columns. I don't like having to write lots of column names out — but if I was worried about performance, I'd measure to see if it made a significant difference.

You might find that there's an OLAP function/query that will do the job notationally quicker.

Share:
14,491
Marc Gordon
Author by

Marc Gordon

Updated on June 16, 2022

Comments

  • Marc Gordon
    Marc Gordon almost 2 years

    The following code list all the invoices, and I just want the oldest invoice from a vendor:

    SELECT DISTINCT vendor_name, i.invoice_number AS OLDEST_INVOICE, 
        MIN(i.invoice_date), i.invoice_total
    FROM vendors v
    JOIN invoices i
    ON i.vendor_id = v.vendor_id
    GROUP BY vendor_name, invoice_number, invoice_total
    ORDER BY MIN(i.invoice_date);