MySQL "Group By" and "Order By"

243,394

Solution 1

A simple solution is to wrap the query into a subselect with the ORDER statement first and applying the GROUP BY later:

SELECT * FROM ( 
    SELECT `timestamp`, `fromEmail`, `subject`
    FROM `incomingEmails` 
    ORDER BY `timestamp` DESC
) AS tmp_table GROUP BY LOWER(`fromEmail`)

This is similar to using the join but looks much nicer.

Using non-aggregate columns in a SELECT with a GROUP BY clause is non-standard. MySQL will generally return the values of the first row it finds and discard the rest. Any ORDER BY clauses will only apply to the returned column value, not to the discarded ones.

IMPORTANT UPDATE Selecting non-aggregate columns used to work in practice but should not be relied upon. Per the MySQL documentation "this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate."

As of 5.7.5 ONLY_FULL_GROUP_BY is enabled by default so non-aggregate columns cause query errors (ER_WRONG_FIELD_WITH_GROUP)

As @mikep points out below the solution is to use ANY_VALUE() from 5.7 and above

See http://www.cafewebmaster.com/mysql-order-sort-group https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_any-value

Solution 2

As pointed in a reply already, the current answer is wrong, because the GROUP BY arbitrarily selects the record from the window.

If one is using MySQL 5.6, or MySQL 5.7 with ONLY_FULL_GROUP_BY, the correct (deterministic) query is:

SELECT incomingEmails.*
  FROM (
    SELECT fromEmail, MAX(timestamp) `timestamp`
    FROM incomingEmails
    GROUP BY fromEmail
  ) filtered_incomingEmails
  JOIN incomingEmails USING (fromEmail, timestamp)
GROUP BY fromEmail, timestamp

In order for the query to run efficiently, proper indexing is required.

Note that for simplification purposes, I've removed the LOWER(), which in most cases, won't be used.

Solution 3

Here's one approach:

SELECT cur.textID, cur.fromEmail, cur.subject, 
     cur.timestamp, cur.read
FROM incomingEmails cur
LEFT JOIN incomingEmails next
    on cur.fromEmail = next.fromEmail
    and cur.timestamp < next.timestamp
WHERE next.timestamp is null
and cur.toUserID = '$userID' 
ORDER BY LOWER(cur.fromEmail)

Basically, you join the table on itself, searching for later rows. In the where clause you state that there cannot be later rows. This gives you only the latest row.

If there can be multiple emails with the same timestamp, this query would need refining. If there's an incremental ID column in the email table, change the JOIN like:

LEFT JOIN incomingEmails next
    on cur.fromEmail = next.fromEmail
    and cur.id < next.id

Solution 4

Do a GROUP BY after the ORDER BY by wrapping your query with the GROUP BY like this:

SELECT t.* FROM (SELECT * FROM table ORDER BY time DESC) t GROUP BY t.from

Solution 5

According to SQL standard you cannot use non-aggregate columns in select list. MySQL allows such usage (uless ONLY_FULL_GROUP_BY mode used) but result is not predictable.

ONLY_FULL_GROUP_BY

You should first select fromEmail, MIN(read), and then, with second query (or subquery) - Subject.

Share:
243,394

Related videos on Youtube

SubParProgrammer
Author by

SubParProgrammer

Updated on July 08, 2022

Comments

  • SubParProgrammer
    SubParProgrammer almost 2 years

    I want to be able to select a bunch of rows from a table of e-mails and group them by the from sender. My query looks like this:

    SELECT 
        `timestamp`, `fromEmail`, `subject`
    FROM `incomingEmails` 
    GROUP BY LOWER(`fromEmail`) 
    ORDER BY `timestamp` DESC
    

    The query almost works as I want it — it selects records grouped by e-mail. The problem is that the subject and timestamp don't correspond to the most recent record for a particular e-mail address.

    For example, it might return:

    fromEmail: [email protected], subject: hello
    fromEmail: [email protected], subject: welcome
    

    When the records in the database are:

    fromEmail: [email protected], subject: hello
    fromEmail: [email protected], subject: programming question
    fromEmail: [email protected], subject: welcome
    

    If the "programming question" subject is the most recent, how can I get MySQL to select that record when grouping the e-mails?

  • SubParProgrammer
    SubParProgrammer almost 15 years
    Said that textID was ambiguous =/
  • Andomar
    Andomar almost 15 years
    Then remove the ambuigity and prefix it with the table name, like cur.textID. Changed in the answer as well.
  • Andomar
    Andomar almost 15 years
    MIN(read) would return the minimal value of "read". He's probably looking for the "read" flag of the latest email instead.
  • velcrow
    velcrow about 11 years
    I came up with the same solution a few years ago, and its a great solution. kudos to b7kich. Two issues here though... GROUP BY is case insensitive so LOWER() is unnecessary, and second, $userID appears to be a variable directly from PHP, your code may be sql injection vulnerable if $userID is user-supplied and not forced to be an integer.
  • xrDDDD
    xrDDDD over 10 years
    So the GROUP BY` automatically selects the latest time, or the newest time, or random?
  • 11101101b
    11101101b over 10 years
    It selects the newest time because we are ordering by time DESC and then the group by takes the first one (latest).
  • IcarusNM
    IcarusNM almost 9 years
    Now if only I could do JOINS on sub-selects in VIEWS, in mysql 5.1. Maybe that feature comes in a newer release.
  • VisioN
    VisioN over 8 years
    This is the only solution that is possible to do with Doctrine DQL.
  • Loveen Dyall
    Loveen Dyall almost 7 years
    This doesn't work when you're trying to self join for multiple columns so well. IE when you're trying to find the latest email and the latest username and you require multiple self left joins to perform this operation in a single query.
  • Arthur Shipkowski
    Arthur Shipkowski almost 7 years
    The IMPORTANT UPDATE also applies to MariaDB: mariadb.com/kb/en/mariadb/…
  • Will B.
    Will B. over 6 years
    When working with past and future timestamps/dates, to limit the resultset to non-future dates, you need to add another condition to the LEFT JOIN criteria AND next.timestamp <= UNIX_TIMESTAMP()
  • Jette
    Jette almost 6 years
    This should be the correct answer. I just discovered a bug on my website related to this. The order by in the subselect in the other answers, has no effect at all.
  • Richard
    Richard almost 6 years
    OMG, please make this the accepted answer. The accepted one wasted 5 hours of my time :(
  • mikep
    mikep about 5 years
    As of 5.7.5 ONLY_FULL_GROUP_BY is enabled by default, i.e. it's impossible to use non-aggregate columns. SQL mode can be changed during runtime without admin privileges, so it is very easy to disable ONLY_FULL_GROUP_BY. For example: SET SESSION sql_mode = '';. Demo: db-fiddle.com/f/esww483qFQXbXzJmkHZ8VT/3
  • mikep
    mikep about 5 years
    Or another alternative to bypass enabled ONLY_FULL_GROUP_BY is to use ANY_VALUE(). See more dev.mysql.com/doc/refman/8.0/en/…
  • Cârnăciov
    Cârnăciov about 3 years
    This is WRONG, ORDER BY is discarded from subqueries, the row selected from the nested query is random. It might work sometimes, adding on to the confusion, but this will result in a nightmare bug. Correct answer is here stackoverflow.com/questions/1066453/mysql-group-by-and-order‌​-by/…
  • b7kich
    b7kich about 3 years
    ORDER BY is definitely not getting discarded from subqueries. But I like Marcus' answer too.
  • b7kich
    b7kich about 3 years
    I like this answer but it still needs ordering in the end