ROW_NUMBER() in MySQL

663,992

Solution 1

I want the row with the single highest col3 for each (col1, col2) pair.

That's a groupwise maximum, one of the most commonly-asked SQL questions (since it seems like it should be easy, but actually it kind of isn't).

I often plump for a null-self-join:

SELECT t0.col3
FROM table AS t0
LEFT JOIN table AS t1 ON t0.col1=t1.col1 AND t0.col2=t1.col2 AND t1.col3>t0.col3
WHERE t1.col1 IS NULL;

“Get the rows in the table for which no other row with matching col1,col2 has a higher col3.” (You will notice this and most other groupwise-maximum solutions will return multiple rows if more than one row has the same col1,col2,col3. If that's a problem you may need some post-processing.)

Solution 2

There is no ranking functionality in MySQL. The closest you can get is to use a variable:

SELECT t.*, 
       @rownum := @rownum + 1 AS rank
  FROM YOUR_TABLE t, 
       (SELECT @rownum := 0) r

so how would that work in my case? I'd need two variables, one for each of col1 and col2? Col2 would need resetting somehow when col1 changed..?

Yes. If it were Oracle, you could use the LEAD function to peak at the next value. Thankfully, Quassnoi covers the logic for what you need to implement in MySQL.

Solution 3

I always end up following this pattern. Given this table:

+------+------+
|    i |    j |
+------+------+
|    1 |   11 |
|    1 |   12 |
|    1 |   13 |
|    2 |   21 |
|    2 |   22 |
|    2 |   23 |
|    3 |   31 |
|    3 |   32 |
|    3 |   33 |
|    4 |   14 |
+------+------+

You can get this result:

+------+------+------------+
|    i |    j | row_number |
+------+------+------------+
|    1 |   11 |          1 |
|    1 |   12 |          2 |
|    1 |   13 |          3 |
|    2 |   21 |          1 |
|    2 |   22 |          2 |
|    2 |   23 |          3 |
|    3 |   31 |          1 |
|    3 |   32 |          2 |
|    3 |   33 |          3 |
|    4 |   14 |          1 |
+------+------+------------+

By running this query, which doesn't need any variable defined:

SELECT a.i, a.j, count(*) as row_number FROM test a
JOIN test b ON a.i = b.i AND a.j >= b.j
GROUP BY a.i, a.j

Hope that helps!

Solution 4

SELECT 
    @i:=@i+1 AS iterator, 
    t.*
FROM 
    tablename AS t,
    (SELECT @i:=0) AS foo

Solution 5

From MySQL 8.0.0 and above you could natively use windowed functions.

1.4 What Is New in MySQL 8.0:

Window functions.

MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to that row. These include functions such as RANK(), LAG(), and NTILE(). In addition, several existing aggregate functions now can be used as window functions; for example, SUM() and AVG().

ROW_NUMBER() over_clause :

Returns the number of the current row within its partition. Rows numbers range from 1 to the number of partition rows.

ORDER BY affects the order in which rows are numbered. Without ORDER BY, row numbering is indeterminate.

Demo:

CREATE TABLE Table1(
  id INT AUTO_INCREMENT PRIMARY KEY, col1 INT,col2 INT, col3 TEXT);

INSERT INTO Table1(col1, col2, col3)
VALUES (1,1,'a'),(1,1,'b'),(1,1,'c'),
       (2,1,'x'),(2,1,'y'),(2,2,'z');

SELECT 
    col1, col2,col3,
    ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS intRow
FROM Table1;

DBFiddle Demo

Share:
663,992
Momo
Author by

Momo

I work as a software development manager, working mostly with C#, Azure and SQL Server.

Updated on July 08, 2022

Comments

  • Momo
    Momo almost 2 years

    Is there a nice way in MySQL to replicate the SQL Server function ROW_NUMBER()?

    For example:

    SELECT 
        col1, col2, 
        ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS intRow
    FROM Table1
    

    Then I could, for example, add a condition to limit intRow to 1 to get a single row with the highest col3 for each (col1, col2) pair.

  • Momo
    Momo over 14 years
    Hmm....so how would that work in my case? I'd need two variables, one for each of col1 and col2? Col2 would need resetting somehow when col1 changed..?
  • Momo
    Momo over 14 years
    But what if there are two maximal values of col3 for a (col1, col2) pair? You'd end up with two rows.
  • Amit Patil
    Amit Patil over 14 years
    @Paul: yes! Just added a note about that in the answer a tic ago. You can usually easily drop unwanted extra rows in the application layer afterwards on some random basis, but if you have a lot of rows all with the same col3 it can be problematic.
  • Momo
    Momo over 14 years
    In t-sql I tend to need this as a sub-query as part of a much larger query, so post-processing isn't really an option. Also...what if you wanted the rows with the top n highest rows values of col3? With my t-sql example, you can add the constraint of intRow <= n, but this would be very hard with a self-join.
  • Amit Patil
    Amit Patil over 14 years
    If you took “with the single highest col3” literally you could make it return no rows instead of 2 in this case by using >= instead of >. But that's unlikely to be what you want! Another option in MySQL is to finish with GROUP BY col1, col2 without using an aggregate expression for col3; MySQL will pick a row at random. However this is invalid in ANSI SQL and generally considered really bad practice.
  • Amit Patil
    Amit Patil over 14 years
    For top N rows you have to add more joins or subqueries for each N, which soon gets unwieldy. Unfortunately LIMIT does not work in subqueries and there's no other arbitrary-selection-order or general windowsing function.
  • Momo
    Momo over 14 years
    Thanks, yes that makes sense. In the case of multiple maxima it certainly will have to be an arbitrary row, so the GROUP BY seems logical. The extra joins or subqueries sound a bit dubious though, especially if n is variable. The choice of preferred answer is a toss-up between this and OMG Ponies', as they both will replicate the functionality I need, but in a somewhat hard-to-read, slightly hacky way.
  • Momo
    Momo over 14 years
    Thanks...as I said above, this answer is equally accepted bobince's, but I can only tick one :-)
  • Bill Karwin
    Bill Karwin over 14 years
    @bobince: There's an easy solution to get the top N rows. See stackoverflow.com/questions/1442527/…
  • Momo
    Momo over 14 years
    @Bill Karwin: That's a nice solution. Although in this case, the column we're sorting upon isn't necessarily unique so we may get more than n values.
  • Amit Patil
    Amit Patil over 14 years
    @Bill: nifty! What's the performance like on this sort of query, generally? Seeing heavy lifting in HAVING always makes me nervous. :-)
  • Roland Bouman
    Roland Bouman over 14 years
    Assigning to and reading from user-defined variables in the same statement is not reliable. this is documented here: dev.mysql.com/doc/refman/5.0/en/user-variables.html: "As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement."
  • OMG Ponies
    OMG Ponies over 14 years
    @Roland: I've only tested on small datasets, haven't had any issue. Too bad MySQL has yet to address the functionality - the request has been in since 2008
  • littlegreen
    littlegreen over 13 years
  • sholsinger
    sholsinger over 12 years
    The first := seems to be missing from @OMG Ponies answer. Thanks for posting this Peter Johnson.
  • newtover
    newtover over 12 years
    bobince, the solution became rather popular here on SO, but I have a question. The solution is basically the same as if someone would try to find the largest id with the following query: SELECT t1.id FROM test t1 LEFT JOIN test t2 ON t1.id>t2.id WHERE t2.id IS NULL; Does not it require n*n/2 + n/2 IS NULL comparisons to find the single row? Do there happen any optimizations I do not see? I tried to ask the similar question to Bill in another thread but he seems to have ignored it.
  • Jon Armstrong - Xgc
    Jon Armstrong - Xgc over 11 years
    @Paul - To address the case where multiple rows exist that match the max per group and you wish to grab just one, you can always add the primary key in the ON clause logic to break the tie... SELECT t0.col3 FROM table AS t0 LEFT JOIN table AS t1 ON t0.col1 = t1.col1 AND t0.col2 = t1.col2 AND (t1.col3, t1.pk) > (t0.col3, t0.pk) WHERE t1.col1 IS NULL ;
  • csonuryilmaz
    csonuryilmaz over 10 years
    According to my experience if you use INNER JOINs in your query, use ",(SELECT @rownum := 0) r" statement after INNER JOINs.
  • andig
    andig over 10 years
    I guess (SELECT @i:=0) AS foo should be the first table in the FROM statement, especially if other tables use sub-selects
  • Tushar
    Tushar almost 10 years
    if columns are VARCHAR or CHAR, how can you handle that with this structure?
  • luckykrrish
    luckykrrish over 9 years
    You are awesome Mosty, I'm looking exactly for this
  • Tom Chiverton
    Tom Chiverton about 9 years
    I don't follow. How is "@i := @i + 1 as position" not a direct replacement for "ROW_NUMBER() over (order by sum(score) desc) as position" ?
  • Tom Chiverton
    Tom Chiverton about 9 years
    Why do you even need the '.. as foo' ?
  • ExStackChanger
    ExStackChanger over 8 years
    @TomChiverton If it's missing, you get: "Error Code: 1248. Every derived table must have its own alias"
  • Stuart Watt
    Stuart Watt over 8 years
    Awesome. This actually does the partitioning. Very handy
  • Álvaro González
    Álvaro González over 8 years
    Sorry but as far as I know MySQL does not support common table expressions.
  • Utsav
    Utsav over 8 years
    Just gave this answer using your logic for row_number. Thanks.
  • Kenneth Xu
    Kenneth Xu about 8 years
    Comparing to self join, this is much more efficient, but there is an issue with the logic, order must occur before computing row_num, concat is also not necessary. ``` SELECT @row_num := IF(@prev_col1=t.col1 AND @prev_col2=t.col2), @row_num+1, 1) AS RowNumber ,t.col1 ,t.col2 ,t.col3 ,t.col4 ,@prev_col1 := t.col1 ,@prev_col2 := t.col2 FROM (SELECT * FROM table1 ORDER BY col1, col2, col3) t, (SELECT @row_num := 1, @prev_col1 := '', @prev_col2 := '') var ```
  • Diego
    Diego over 7 years
    This can be used on UPDATE queries? I am trying but I get a "data truncated for column..." error.
  • Diego
    Diego over 7 years
    If anyone is interested on using it on UPDATE, it must be used as a sub-query in order to work. UPDATE <table> SET <field> = (SELECT \@row_number := \@row_number +1) ORDER BY <your order column>; The order column determines the value ordering of the rows.
  • pnomolos
    pnomolos over 7 years
    Note that ORDER BY in a subquery could be ignored (see mariadb.com/kb/en/mariadb/…). The suggested solution to that is to add LIMIT 18446744073709551615 to the subquery, which forces a sort. However this could cause performance issues and isn't valid for really freaking huge tables :)
  • Stephan Richter
    Stephan Richter over 7 years
    Works with one limitation: if you execute the query several times, you will get ever-increasing fakeIds for the same result set
  • xmedeko
    xmedeko about 7 years
    If you need tu put this into a subquery, then add limit 18446744073709551615 to force order by clause.
  • xmedeko
    xmedeko about 7 years
    concat_ws with empty string '' is dangerous: concat_ws('',12,3) = concat_ws('',1,23). Better to use some separator '_' or use @Kenneth Xu solution.
  • jberryman
    jberryman about 7 years
    This seems to be undefined behavior as Roland notes. e.g. this gives totally incorrect results for a table I tried: SELECT @row_num:=@row_num+1 AS row_number, t.id FROM (SELECT * FROM table1 WHERE col = 264 ORDER BY id) t, (SELECT @row_num:=0) var;
  • jberryman
    jberryman about 7 years
    The rank assignment here is completely undefined and this doesn't even answer the question
  • jberryman
    jberryman about 7 years
    Is this supposed to be better? They both seem likely to be quadratic, but I'm not sure how to interprate the EXPLAIN output
  • abcdn
    abcdn about 7 years
    In fact, nested selects are known to be not very well optimized in MySQL, so this anwser is just for demonstration of a querying technique. The variable-based examples above work better for most practical cases, I suppose.
  • jberryman
    jberryman about 7 years
    I'm not convinced any of the variable based answers are actually using defined behavior...
  • abcdn
    abcdn about 7 years
    I am sorry, I am not sure I got what you meant by "defined behavior". Do you mean it doesn't work for you, or you are just concerned that it is not documented?
  • wrschneider
    wrschneider almost 7 years
    This would be more readable as SELECT t0.col3 FROM table AS t0 WHERE NOT EXISTS (select 1 from table AS t1 ON t0.col1=t1.col1 AND t0.col2=t1.col2 AND t1.col3>t0.col3)
  • Amit Patil
    Amit Patil almost 7 years
    @wrschneider: It would be more readable, but at the time this answer was written, likely much slower. Subquery support was a relative latecomer to MySQL and initially performed poorly. I would hope today both queries would be pretty optimal, but I can't say I've been keeping track of developments...
  • jmpeace
    jmpeace over 6 years
    you could send set @fakeId =0; each time you want to run the query, not optimal but works
  • alex
    alex over 6 years
    @Tushar the operators <, >, <=, >= handle CHAR and VARCHAR data types on alphabetic order; I expect, is exactly what you are looking for.
  • Paul Maxwell
    Paul Maxwell over 6 years
    sigh... at last !
  • philipxy
    philipxy over 6 years
    "Undefined behaviour" means that it is not documented to work and/or documented to not be guaranteed to work. See documentation quotes & links in comments on this page. It might return what one (unsoundly) wants/guesses/hypothesizes/fantasizes. For certain versions of the implementation certain query expressions using CASE incrementing & using variables has been shown to work by programmers at Percona by looking at the code. That could change with any release.
  • Almaz Vildanov
    Almaz Vildanov almost 6 years
    can I add a condition where row_number <= 2? And How?
  • Zax
    Zax over 5 years
    @AlmazVildanov you should be able to use this query simply as a subquery fo filter out row_numbers <= 2 And huge thanks for this answer Mosty, it's perfect!
  • sam-6174
    sam-6174 over 5 years
    op's link is dead; archive of link here
  • Caius Jard
    Caius Jard about 5 years
    It doesn't do any partitioning though, and it isn't significantly different to a higher cited answer
  • WestCoastProjects
    WestCoastProjects about 5 years
    I am linking/using this answer at stackoverflow.com/questions/55778739/…
  • whyer
    whyer almost 5 years
    @JonArmstrong-Xgc, btw if one had a multi-criteria sorting with different sorting order like ORDER BY col1 ASC, col2 ASC, pk DESC etc AND one of the sorting orders (either ASC or DESC had only numeric criterion like int or float), then one may simply add a minus sign before the numeric criterion of the opposite sorting order, e.g. (t1.col3, -t1.pk) > (t0.col3, -t0.pk), otherwise have to manually specify: t1.col3 > t0.col3 OR t1.col3 = t0.col3 AND STRCMP(t1.surname, t0.surname) < 0
  • philipxy
    philipxy almost 5 years
    @TomChiverton Because its behaviour is not defined, as the manual says right there.
  • philipxy
    philipxy almost 5 years
    There is no justification for this. Just like the other answers that assign to & read from the same variable.
  • Caius Jard
    Caius Jard almost 5 years
    Can you supply more detail phil?
  • philipxy
    philipxy almost 5 years
    See my other comments on this page. Googling 'site:stackoverflow.com "philipxy" mysql variable (set OR assign OR assignment OR write) read': An answer by me & a bug report linked in a comment by me at this question where the accepted answer quotes the manual yet immediately in claims it's OK to do something in contradiction to it. Read the manual re variables & re assignment.
  • philipxy
    philipxy almost 5 years
  • Caius Jard
    Caius Jard almost 5 years
    I understand your concern
  • Raymond Nijland
    Raymond Nijland over 4 years
    it does now ... @ÁlvaroGonzález MySQL 8 only supports CTE and window functions, so this answer does not really make sense to use in older MySQL versions..
  • Chris Muench
    Chris Muench over 4 years
    A really odd issue happens if you remove DETERMINISTIC. Then the fakeId is incorrect when using order by. Why is this?
  • Xin Niu
    Xin Niu over 3 years
    Is this work on mysql? I got syntax error when I run it ...
  • Johan
    Johan over 3 years
    this needs to be upvoted, I wasted many hours due to missing this one
  • zhongxiao37
    zhongxiao37 over 3 years
    I wonder when WHEN (@prevcol := col) = null THEN null will be executed. Did you mean null is null? = null is always unknown, which means false.
  • Caius Jard
    Caius Jard over 3 years
    @zhongxiao37 You need to read the whole answer. I explain in detail why this second when clause is structured so that it is guaranteed to always be false. If you don't want to read the whole answer, Ctrl-F for The second WHEN predicate is always false and read the bullet point that starts with this sentence
  • m1ld
    m1ld about 3 years
    For me this stopped working in MySQL 8.0.22.
  • M-O-H-S-E-N
    M-O-H-S-E-N about 3 years
    u save me bro!!
  • Martin T.
    Martin T. over 2 years
    Very cool! But I just realized it is the same as the top answer.