Alternative to using GROUP BY without aggregates to retrieve distinct "best" result
Solution 1
This is basically a form of the groupwise-maximum-with-ties problem. I don't think there is a SQL standard compliant solution. A solution like this would perform nicely:
SELECT s2.id
, s2.title
, s2.episode
, s2.is_hidef
, s2.is_verified
FROM (
select distinct title
, episode
from shows
where title = 'The Simpsons'
) s1
JOIN shows s2
ON s2.id =
(
select id
from shows s3
where s3.title = s1.title
and s3.episode = s1.episode
order by
s3.is_hidef DESC
, s3.is_verified DESC
limit 1
)
But given the cost of readability, I would stick with your original query.
Solution 2
In some way similar to Andomar's but this one really works.
select C.*
FROM
(
select min(ID) minid
from (
select distinct title, ep, max(hidef*1 + verified*1) ord
from tbl
group by title, ep) a
inner join tbl b on b.title=a.title and b.ep=a.ep and b.hidef*1 + b.verified*1 = a.ord
group by a.title, a.ep, a.ord
) D inner join tbl C on D.minid = C.id
The first level tiebreak converts bits (SQL Server) or MySQL boolean to an integer value using *1, and the columns are added to produce the "best" value. You can give them weights, e.g. if hidef > verified, then use hidef*2 + verified*1 which can produce 3,2,1 or 0.
The 2nd level looks among those of the "best" scenario and extracts the minimum ID (or some other tie-break column). This is essential to reduce a multi-match result set to just one record.
In this particular case (table schema), the outer select uses the direct key to retrieve the matched records.
Tyris
Updated on July 09, 2022Comments
-
Tyris almost 2 years
I'm trying to retrieve the "Best" possible entry from an SQL table.
Consider a table containing tv shows: id, title, episode, is_hidef, is_verified eg:
id title ep hidef verified 1 The Simpsons 1 True False 2 The Simpsons 1 True True 3 The Simpsons 1 True True 4 The Simpsons 2 False False 5 The Simpsons 2 True False
There may be duplicate rows for a single title and episode which may or may not have different values for the boolean fields. There may be more columns containing additional info, but thats unimportant.
I want a result set that gives me the best row (so is_hidef and is_verified are both "true" where possible) for each episode. For rows considered "equal" I want the most recent row (natural ordering, or order by an abitrary datetime column).
3 The Simpsons 1 True True 5 The Simpsons 2 True False
In the past I would have used the following query:
SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified
This works under MySQL and SQLite, but goes against the SQL spec (GROUP BY requiring aggragates etc etc). I'm not really interested in hearing again why MySQL is so bad for allowing this; but I'm very interested in finding an alternative solution that will work on other engines too (bonus points if you can give me the django ORM code for it).
Thanks =)