Alternative to using GROUP BY without aggregates to retrieve distinct "best" result

29,339

Solution 1

This is basically a form of the groupwise-maximum-with-ties problem. I don't think there is a SQL standard compliant solution. A solution like this would perform nicely:

SELECT  s2.id
,       s2.title
,       s2.episode
,       s2.is_hidef
,       s2.is_verified
FROM    (
        select  distinct title
        ,       episode
        from    shows
        where   title = 'The Simpsons' 
        ) s1
JOIN    shows s2
ON      s2.id = 
        (
        select  id
        from    shows s3
        where   s3.title = s1.title
                and s3.episode = s1.episode
        order by
                s3.is_hidef DESC
        ,       s3.is_verified DESC
        limit   1
        )

But given the cost of readability, I would stick with your original query.

Solution 2

In some way similar to Andomar's but this one really works.

select C.*
FROM
(
    select min(ID) minid
    from (
        select distinct title, ep, max(hidef*1 + verified*1) ord
        from tbl
        group by title, ep) a
    inner join tbl b on b.title=a.title and b.ep=a.ep and b.hidef*1 + b.verified*1 = a.ord
    group by a.title, a.ep, a.ord
) D inner join tbl C on D.minid = C.id

The first level tiebreak converts bits (SQL Server) or MySQL boolean to an integer value using *1, and the columns are added to produce the "best" value. You can give them weights, e.g. if hidef > verified, then use hidef*2 + verified*1 which can produce 3,2,1 or 0.

The 2nd level looks among those of the "best" scenario and extracts the minimum ID (or some other tie-break column). This is essential to reduce a multi-match result set to just one record.

In this particular case (table schema), the outer select uses the direct key to retrieve the matched records.

Share:
29,339
Tyris
Author by

Tyris

Updated on July 09, 2022

Comments

  • Tyris
    Tyris almost 2 years

    I'm trying to retrieve the "Best" possible entry from an SQL table.

    Consider a table containing tv shows: id, title, episode, is_hidef, is_verified eg:

    id title         ep hidef verified
    1  The Simpsons  1  True  False
    2  The Simpsons  1  True  True
    3  The Simpsons  1  True  True
    4  The Simpsons  2  False False
    5  The Simpsons  2  True  False
    

    There may be duplicate rows for a single title and episode which may or may not have different values for the boolean fields. There may be more columns containing additional info, but thats unimportant.

    I want a result set that gives me the best row (so is_hidef and is_verified are both "true" where possible) for each episode. For rows considered "equal" I want the most recent row (natural ordering, or order by an abitrary datetime column).

    3  The Simpsons  1  True  True
    5  The Simpsons  2  True  False
    

    In the past I would have used the following query:

    SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified
    

    This works under MySQL and SQLite, but goes against the SQL spec (GROUP BY requiring aggragates etc etc). I'm not really interested in hearing again why MySQL is so bad for allowing this; but I'm very interested in finding an alternative solution that will work on other engines too (bonus points if you can give me the django ORM code for it).

    Thanks =)