Delete duplicate records in SQL Server?

sql tsql duplicates delete-row

112,022

Solution 1

You can do this with window functions. It will order the dupes by empId, and delete all but the first one.

delete x from (
  select *, rn=row_number() over (partition by EmployeeName order by empId)
  from Employee 
) x
where rn > 1;

Run it as a select to see what would be deleted:

select *
from (
  select *, rn=row_number() over (partition by EmployeeName order by empId)
  from Employee 
) x
where rn > 1;

Solution 2

Assuming that your Employee table also has a unique column (ID in the example below), the following will work:

delete from Employee 
where ID not in
(
    select min(ID)
    from Employee 
    group by EmployeeName 
);

This will leave the version with the lowest ID in the table.

Edit
Re McGyver's comment - as of SQL 2012

MIN can be used with numeric, char, varchar, uniqueidentifier, or datetime columns, but not with bit columns

For 2008 R2 and earlier,

MIN can be used with numeric, char, varchar, or datetime columns, but not with bit columns (and it also doesn't work with GUID's)

For 2008R2 you'll need to cast the GUID to a type supported by MIN, e.g.

delete from GuidEmployees
where CAST(ID AS binary(16)) not in
(
    select min(CAST(ID AS binary(16)))
    from GuidEmployees
    group by EmployeeName 
);

SqlFiddle for various types in Sql 2008

SqlFiddle for various types in Sql 2012

Solution 3

You could try something like the following:

delete T1
from MyTable T1, MyTable T2
where T1.dupField = T2.dupField
and T1.uniqueField > T2.uniqueField

(this assumes that you have an integer based unique field)

Personally though I'd say you were better off trying to correct the fact that duplicate entries are being added to the database before it occurs rather than as a post fix-it operation.

Solution 4

WITH CTE AS
(
   SELECT EmployeeName, 
          ROW_NUMBER() OVER(PARTITION BY EmployeeName ORDER BY EmployeeName) AS R
   FROM employee_table
)
DELETE CTE WHERE R > 1;

The magic of common table expressions.

Solution 5

DELETE
FROM MyTable
WHERE ID NOT IN (
     SELECT MAX(ID)
     FROM MyTable
     GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

WITH TempUsers (FirstName, LastName, duplicateRecordCount)
AS
(
    SELECT FirstName, LastName,
    ROW_NUMBER() OVER (PARTITIONBY FirstName, LastName ORDERBY FirstName) AS duplicateRecordCount
    FROM dbo.Users
)
DELETE
FROM TempUsers
WHERE duplicateRecordCount > 1

View more solutions

112,022

usr021986

Updated on October 22, 2020

Comments

usr021986 over 3 years
Consider a column named EmployeeName table Employee. The goal is to delete repeated records, based on the EmployeeName field.
```
EmployeeName
------------
Anand
Anand
Anil
Dipak
Anil
Dipak
Dipak
Anil
```
Using one query, I want to delete the records which are repeated.

How can this be done with TSQL in SQL Server?
- Sarfraz almost 14 years
  
  You mean delete duplicate records, right?
- DaeMoohn almost 14 years
  
  you could select the distinct values and their related IDs and delete those records whose IDs aren't in the already selected list?
- Andrew Bullock almost 14 years
  
  do you have an unique ID column?
- usr021986 almost 14 years
  
  No I dont have the Unique ID column
- armen over 10 years
  
  how did you accept the answer given by John Gibb, if table lacks of unique id? where is the empId column in your example used by John ?
- John Gibb over 10 years
  
  If you don't have a unique ID column, or anything else meaningful to do an order by, you COULD also order by the employeename column... so your rn would be row_number() over (partition by EmployeeName order by EmployeeName)... this would pick an arbitrary single record for each name.
- Erik over 6 years
  
  Possible duplicate of How can I remove duplicate rows?
Brandon Horsley almost 14 years

Also, in Oracle, you could use "rowid" if there is no other unique id column.
Kyle B. almost 14 years

+1 Even if there were not an ID column, one could be added as an identity field.
usr021986 almost 14 years

I donot have the unique field(ID) in my Table. How can i perform the operation then.
MacGyver over 10 years

SubPortal / a_horse_with_no_name - shouldn't this be selecting from an actual table? Also, ROW_NUMBER should be ROW_NUMBER() because it's a function, correct?
Arithmomaniac almost 8 years

If you don't have a primary key, you can use ORDER BY (SELECT NULL) stackoverflow.com/a/4812038
MiBol over 5 years

Excellent answer. Sharp and effective. Even if the table doesn't have an ID; it's better to include one to execute this method.