Join to only the "latest" record with t-sql

sql sql-server tsql join

42,459

Solution 1

select a.*, bm.MaxRowDate
from (
    select TableAID, max(RowDate) as MaxRowDate
    from TableB
    group by TableAID
) bm
inner join TableA a on bm.TableAID = a.ID

If you need more columns from TableB, do this:

select a.*, b.* --use explicit columns rather than * here
from (
    select TableAID, max(RowDate) as MaxRowDate
    from TableB
    group by TableAID
) bm
inner join TableB b on bm.TableAID = b.TableAID
    and bm.MaxRowDate = b.RowDate
inner join TableA a on bm.TableAID = a.ID

Solution 2

SELECT *
FROM tableA A
OUTER APPLY (SELECT TOP 1 * 
             FROM tableB B
             WHERE A.ID = B.TableAID
             ORDER BY B.RowDate DESC) as B

Solution 3

table B join is optional: it depends if there are other columns you want

SELECT
    *
FROM
    tableA A
    JOIN
    tableB B ON A.ID = B.TableAID
    JOIN
    (
    SELECT Max(RowDate) AS MaxRowDate, TableAID
    FROM tableB
    GROUP BY TableAID
    ) foo ON B.TableAID = foo.TableAID AND B.RowDate= foo.MaxRowDate

Solution 4

With ABDateMap AS (
    SELECT Max(RowDate) AS LastDate, TableAID FROM TableB GROUP BY TableAID
),
LatestBRow As (
    SELECT MAX(ID) AS ID, TableAID FROM ABDateMap INNER JOIN TableB ON b.TableAID=a.ID AND b.RowDate = LastDate GROUP BY TableAID
)
SELECT columns
FROM TableA a
INNER JOIN LatestBRow m ON m.TableAID=a.ID
INNER JOIN TableB b on b.ID = m.ID

Solution 5

Just for the clarity's sake and to benefit those who will stumble upon this ancient question. The accepted answer would return duplicate rows if there are duplicate RowDate in Table B. A safer and more efficient way would be to utilize ROW_NUMBER():

Select a.*, b.* -- Use explicit column list rather than * here
From [Table A] a
Inner Join ( -- Use Left Join if the records missing from Table B are still required
    Select *,
        ROW_NUMBER() OVER (PARTITION BY TableAID ORDER BY RowDate DESC) As _RowNum
    From [Table B]
) b
On b.TableAID = a.ID
Where b._RowNum = 1

View more solutions

42,459

Joel Martinez

Software Engineer @ Xamarin (now a part of Microsoft). In the past, I founded the Orlando .NET User Group (ONETUG), wrote a few books, worked at a few startups, made a few games, and was a Microsoft XNA MVP for a few years. You can find me on twitter (@joelmartinez), or on the web at http://codecube.net

Updated on July 09, 2022

Comments

Joel Martinez almost 2 years
I've got two tables. Table "B" has a one to many relationship with Table "A", which means that there will be many records in table "B" for one record in table "A".

The records in table "B" are mainly differentiated by a date, I need to produce a resultset that includes the record in table "A" joined with only the latest record in table "B". For illustration purpose, here's a sample schema:
```
Table A
-------
ID

Table B
-------
ID
TableAID
RowDate
```
I'm having trouble formulating the query to give me the resultset I'm looking for any help would be greatly appreciated.
- Paul Hadfield over 13 years
  
  In the case of two rows in table B having the same datetime, how will you define the single latest record? It is worth noting that DateTime is only good to 3ms (or something like that) whilst DateTime2 can measure down to nanoseconds (so is more accurate).
- Lamak over 13 years
  
  The column to join [Table A] with [Table B] is TableAID?
- Joel Martinez over 13 years
  
  Should have included this ... you can assume that there won't be duplicate datetimes, so there will always be a "latest" :-)
Paul Hadfield over 13 years

This does not quite answer the question fully as it only gets max row date from table b. The question has asked for the most recent row - so rest of table b needs to be returned for that row and also needs to take into account Table B having two entries of the same date/time for a TableA reference
D'Arcy Rittich over 13 years

@Paul: I assumed TableB only has specified fields by OP. Have modified query to handle the case you mention. Duplicates may or may not be an issue for the user's data.
Joel Martinez over 13 years

Just curious ... why the avoidance of "*"? what are the draw backs to using it if I really want all of the columns in the resultset anyways?
D'Arcy Rittich over 13 years

@Joel: avoiding * solves problems of: duplicate columns, assumptions about returned column order, performance penalty of selecting more data than you need, the list goes on.
Michael B. about 9 years

outer apply is much quicker !
Sonny Childs about 8 years

That outer apply is blazing fast compared to other approaches I tried on my system.
Maderas about 7 years

I just tested the accepted answer vs this answer and I found that the outer apply was ~2 times slower than the accepted answer!
Daniel Lorenz almost 7 years

@Maderas You ran them both more than once during the testing, right? Otherwise, the data could have been cached.
Maderas almost 7 years

@DanielLorenz I don't remember. I usually run them a minimum of 5 times each. I also tend to restart my dev servers before I test something new to find out how well it performs, but... i don't remember.