SQL Query to get column values that correspond with MAX value of another column?

112,799

Solution 1

I would try something like this:

SELECT
   s.video_id
   ,s.video_category
   ,s.video_url
   ,s.video_date
   ,s.video_title
   ,short_description
FROM videos s
   JOIN (SELECT MAX(video_id) AS id FROM videos GROUP BY video_category) max
      ON s.video_id = max.id

which is quite faster that your own solution

Solution 2

I recently released a new technique to handle this type of problem in MySQL.

SCALAR-AGGREGATE REDUCTION

Scalar-Aggregate Reduction is by far the highest-performance approach and simplest method (in DB engine terms) for accomplishing this, because it requires no joins, no subqueries, and no CTE.

For your query, it would look something like this:

SELECT
  video_category,
  MAX(video_id) AS video_id,
  SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), video_url)), 12) AS video_url,
  SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), video_date)), 12) AS video_date,
  SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), video_title)), 12) AS video_title,
  SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), short_description)), 12) AS short_description
FROM
  videos
GROUP BY
  video_category

The combination of scalar and aggregate functions does the following:

  1. LPADs the intra-aggregate correlated identifier to allow proper string comparison (e.g. "0009" and "0025" will be properly ranked). I'm LPADDING to 11 characters here assuming an INT primary key. If you use a BIGINT, you will want to increase this to support your table's ordinality. If you're comparing on a DATETIME field (fixed length), no padding is necessary.
  2. CONCATs the padded identifier with the output column (so you get "00000000009myvalue" vs "0000000025othervalue")
  3. MAX the aggregate set, which will yield "00000000025othervalue" as the winner.
  4. SUBSTRING the result, which will truncate the compared identifier portion, leaving only the value.

If you want to retrieve values in types other than CHAR, you may need to performa an additional CAST on the output, e.g. if you want video_date to be a DATETIME:

CAST(SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), video_date)), 12) AS DATETIME)

Another benefit of this method over the self-joining method is that you can combine other aggregate data (not just latest values), or even combine first AND last item in the same query, e.g.

SELECT
    -- Overall totals
    video_category,
    COUNT(1) AS videos_in_category,
    DATEDIFF(MAX(video_date), MIN(video_date)) AS timespan,
    
    -- Last video details
    MAX(video_id) AS last_video_id,
    SUBSTRING(MAX(CONCAT(LPAD(video_id, 11, '0'), video_url)), 12) AS last_video_url,
    ...
    
    -- First video details
    MIN(video_id) AS first_video_id,
    SUBSTRING(MIN(CONCAT(LPAD(video_id, 11, '0'), video_url)), 12) AS first_video_url,
    ...
    
    -- And so on

For further details explaining the benefits of this method vs other older methods, my full blog post is here: https://www.stevenmoseley.com/blog/tech/high-performance-sql-correlated-scalar-aggregate-reduction-queries

Solution 3

Here is a more general solution (handles duplicates)

CREATE TABLE test(
  i INTEGER,
  c INTEGER,
  v INTEGER
);


insert into test(i, c, v)
values
(3, 1, 1),
(3, 2, 2),
(3, 3, 3),
(4, 2, 4),
(4, 3, 5),
(4, 4, 6),
(5, 3, 7),
(5, 4, 8),
(5, 5, 9),
(6, 4, 10),
(6, 5, 11),
(6, 6, 12);



SELECT t.c, t.v
FROM test t
JOIN (SELECT test.c, max(i) as mi FROM test GROUP BY c) j ON
  t.i = j.mi AND
  t.c  = j.c
ORDER BY c;

Solution 4

A slightly more "rustic" solution, but should do the job just the same:

SELECT
  video_category,
  video_url,
  video_date,
  video_title,
  short_description,
  video_id
FROM
  videos
ORDER BY video_id DESC
LIMIT 1;

In other words, just produce a table with all of the columns that you want, sort it so that your maximum value is at the top, and chop it off so you only return one row.

Share:
112,799
Devin
Author by

Devin

Updated on July 08, 2022

Comments

  • Devin
    Devin almost 2 years

    Ok, this is my query:

    SELECT
      video_category,
      video_url,
      video_date,
      video_title,
      short_description,
      MAX(video_id) 
    FROM
      videos
    GROUP BY
      video_category
    

    When it pulls the data, I get the correct row for the video_id, but it pulls the first row for each category for the others. So when I get the max result for the video_id of category 1, I get the max ID, but the first row in the table for the url, date, title, and description.

    How can I have it pull the other columns that correspond with the max ID result?

    Edit: Fixed.

    SELECT
        *
    FROM
        videos
    WHERE
        video_id IN
        (
            SELECT
                DISTINCT
                MAX(video_id)
            FROM
                videos
            GROUP BY
                video_category
        ) 
    ORDER BY
        video_category ASC