How do I get the oldest date without using a correlated subquery in SQL?
Solution 1
We'll use ROW_NUMBER()
to "rank" the invoices by date per vendor, and then select only the oldest per vendor:
SELECT vendor_name, invoice_number AS oldest_invoice, invoice_date, invoice_total
FROM vendors v
INNER JOIN (SELECT invoices.*,
ROW_NUMBER() OVER (PARTITION BY vendor_id ORDER BY invoice_date ASC)
AS rn
FROM invoices) i
ON i.vendor_id = v.vendor_id
AND
i.rn = 1;
Solution 2
Time for TDQD — Test-Driven Query Design
The minimum date for an invoice for each vendor is given by:
SELECT vendor_id, MIN(invoice_date) AS invoice_date
FROM invoices
GROUP BY vendor_id
The corresponding minimum invoice number (given that there could have been several invoices sent on the first day that a vendor was invoiced, if invoice_date
is a true DATE with no time component; if the DATE includes a time component, the second MIN() is probably unnecessary), is:
SELECT vendor_id, MIN(invoice_number) AS invoice_number
FROM invoices AS i
JOIN (SELECT vendor_id, MIN(invoice_date) AS invoice_date
FROM invoices
GROUP BY vendor_id
) AS j ON j.vendor_id = i.vendor_id AND j.invoice_date = i.invoice_date
GROUP BY vendor_id
You can join this expression with other tables to suit your query requirements:
SELECT v.*, i.*
FROM vendors AS v
JOIN (SELECT vendor_id, MIN(invoice_number) AS invoice_number
FROM invoices AS i
JOIN (SELECT vendor_id, MIN(invoice_date) AS invoice_date
FROM invoices
GROUP BY vendor_id
) AS j ON j.vendor_id = i.vendor_id AND j.invoice_date = i.invoice_date
GROUP BY vendor_id
) AS inv_info ON v.vendor_id = inv_info.vendor_id
JOIN invoices AS i ON i.invoice_number = inv_info.invoice_number
There are undoubtedly other ways to design it. Note that none of these sub-queries are correlated sub-queries.
The TDQD has been purely nominal; no DBMS was troubled with checking whether these queries are syntactically valid, much less return the correct answer. OTOH, it is a standand technique.
If you like listing lots of columns in GROUP BY clauses, you could do without the final join to invoices
by having the inv_info
sub-query return the relevant invoice columns. I don't like having to write lots of column names out — but if I was worried about performance, I'd measure to see if it made a significant difference.
You might find that there's an OLAP function/query that will do the job notationally quicker.
Marc Gordon
Updated on June 16, 2022Comments
-
Marc Gordon almost 2 years
The following code list all the invoices, and I just want the oldest invoice from a vendor:
SELECT DISTINCT vendor_name, i.invoice_number AS OLDEST_INVOICE, MIN(i.invoice_date), i.invoice_total FROM vendors v JOIN invoices i ON i.vendor_id = v.vendor_id GROUP BY vendor_name, invoice_number, invoice_total ORDER BY MIN(i.invoice_date);