MySQL subquery - Find only first record in a LEFT JOIN

10,329

This is the greatest-n-per-group problem, which is asked frequently on Stack Overflow.

Here's how I would solve it in your scenario:

SELECT m.memberid, m.membername, m.gender, mp.phone, mh.loggedtime, mh.ipaddy
FROM tbl_members m 
INNER JOIN tbl_members_phones mp ON m.defaultphoneid = mp.phoneid
INNER JOIN tbl_members_addresses ma ON m.defaultaddressid = ma.addressid
LEFT OUTER JOIN tbl_members_login_history mh ON m.memberid = mh.memberid
LEFT OUTER JOIN tbl_members_login_history mh2 ON m.memberid = mh2.memberid
    AND mh.pk < mh2.pk
WHERE mh2.pk IS NULL;

That is, we want mh to be the most recent row in tbl_member_login_history for the given memberid. So we search for another row mh2 that is even more recent. If none more recent than the mh row is found, then mh2.* will be NULL, so mh must be the most recent.

I'm assuming this table has a primary key column that contains increasing values. For this example, I assume the column name is pk.

Using LEFT OUTER JOIN for both references to the login history table means that the m row will be reported even if there is no matching row.

Share:
10,329
coffeemonitor
Author by

coffeemonitor

Updated on July 29, 2022

Comments

  • coffeemonitor
    coffeemonitor almost 2 years

    I'm trying to display a list of member records, and I have a few tables I'm using to display what I need.

    That's the easy part. The part I need help with is with a table that has many records to each member record: Login history

    I want to display only the first row for each member record, that exists in the Login History table. Alternatively, I may want to flip flop and display the last record in the Login History table, as well.

    here's what I've got so far:

    SELECT m.memberid, m.membername, m.gender, mp.phone
    FROM tbl_members m, 
         tbl_members_phones mp, 
         tbl_members_addresses ma
    WHERE m.defaultphoneid = mp.phoneid
    AND m.defaultaddressid = ma.addressid
    

    So that returns what's expected.

    The 2 columns from tbl_members_login_history I'd like to add to the returned result are: mh.loggedtime, mh.ipaddy

    I know adding the tbl_members_login_history as a LEFT JOIN would return duplicates, so I'm thinking there must be a Subquery necessity here, in order to return just the 1st record for that memberid that exists in tbl_members_login_history.

    What I'm worried about is if no record in the history table exists, I still want to display that member info, but leave the history columns as NULL.

    Would this be a subquery incident? and if so, how does one add that type of LIMIT?

  • steffen
    steffen almost 10 years
    I just tried your solution and compared it to a query that contains a subquery using GROUP BY with MAX(...), joining ON date = maxdate. Your solution with the < operator took 2s (!), while the subquery solution took 0.01s (same result table of course). The EXPLAIN of your query looks much better than the one with the subquery. I don't know why I get this huge performance difference, but I guess it's because the < produces a huge intermediate result.
  • Bill Karwin
    Bill Karwin almost 10 years
    @steffen: Yes, since I answered this question, I've seen a lot of other cases, and concluded that either the solution above or a solution like you describe can be faster, depending on how many distinct groups and how many rows per group the data has. So it's best to test both methods with your data and then decide.
  • steffen
    steffen almost 10 years
    True. What confused me was the huge performance difference. I found your solution a couple of times here on SO. And the EXPLAIN output look great so I expected that query to be very fast. But instead it was so much slower. Weird.
  • Bill Karwin
    Bill Karwin almost 10 years
    @steffen, keep in mind MySQL does nested-loop joins. So you can get a rough estimate of the work it has to do to perform joins by multiplying the rows information from each table joined together. So even if it looks like it's using indexes and everything, if you see a few thousand rows in one table joined to a few thousand rows in another table, it's doing millions of row comparisons.