Delete all Duplicate Rows except for One in MySQL?

505,018

Solution 1

Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.

NB - You need to do this first on a test copy of your table!

When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.

  1. If you want to keep the row with the lowest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
    
  2. If you want to keep the row with the highest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
    

I used this method in MySQL 5.1

Not sure about other versions.


Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE, please be advised that using INSERT and DISTINCT is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.

INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
    SELECT DISTINCT cellId,attributeId,entityRowId,value
    FROM tableName;

Solution 2

If you want to keep the row with the lowest id value:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MIN(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

If you want the id value that is the highest:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MAX(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

The subquery in a subquery is necessary for MySQL, or you'll get a 1093 error.

Share:
505,018
Highway of Life
Author by

Highway of Life

David has worked in Software Development for over 20 years, from every phase of development, build, deployment, systems engineering, operations, and Site Reliability Engineering (DevOps). David is currently the Lead Site Reliability Engineer for the SRE Product team at Starbucks Technology where the focus is on developing applications and shared platforms to enable rapid development by engineering teams across Starbucks.

Updated on July 31, 2022

Comments

  • Highway of Life
    Highway of Life almost 2 years

    How would I delete all duplicate data from a MySQL Table?

    For example, with the following data:

    SELECT * FROM names;
    
    +----+--------+
    | id | name   |
    +----+--------+
    | 1  | google |
    | 2  | yahoo  |
    | 3  | msn    |
    | 4  | google |
    | 5  | google |
    | 6  | yahoo  |
    +----+--------+
    

    I would use SELECT DISTINCT name FROM names; if it were a SELECT query.

    How would I do this with DELETE to only remove duplicates and keep just one record of each?