Delete duplicate records in SQL Server?
Solution 1
You can do this with window functions. It will order the dupes by empId, and delete all but the first one.
delete x from (
select *, rn=row_number() over (partition by EmployeeName order by empId)
from Employee
) x
where rn > 1;
Run it as a select to see what would be deleted:
select *
from (
select *, rn=row_number() over (partition by EmployeeName order by empId)
from Employee
) x
where rn > 1;
Solution 2
Assuming that your Employee table also has a unique column (ID
in the example below), the following will work:
delete from Employee
where ID not in
(
select min(ID)
from Employee
group by EmployeeName
);
This will leave the version with the lowest ID in the table.
Edit
Re McGyver's comment - as of SQL 2012
MIN
can be used with numeric, char, varchar, uniqueidentifier, or datetime columns, but not with bit columns
For 2008 R2 and earlier,
MIN can be used with numeric, char, varchar, or datetime columns, but not with bit columns (and it also doesn't work with GUID's)
For 2008R2 you'll need to cast the GUID
to a type supported by MIN
, e.g.
delete from GuidEmployees
where CAST(ID AS binary(16)) not in
(
select min(CAST(ID AS binary(16)))
from GuidEmployees
group by EmployeeName
);
SqlFiddle for various types in Sql 2008
SqlFiddle for various types in Sql 2012
Solution 3
You could try something like the following:
delete T1
from MyTable T1, MyTable T2
where T1.dupField = T2.dupField
and T1.uniqueField > T2.uniqueField
(this assumes that you have an integer based unique field)
Personally though I'd say you were better off trying to correct the fact that duplicate entries are being added to the database before it occurs rather than as a post fix-it operation.
Solution 4
WITH CTE AS
(
SELECT EmployeeName,
ROW_NUMBER() OVER(PARTITION BY EmployeeName ORDER BY EmployeeName) AS R
FROM employee_table
)
DELETE CTE WHERE R > 1;
The magic of common table expressions.
Solution 5
DELETE
FROM MyTable
WHERE ID NOT IN (
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
WITH TempUsers (FirstName, LastName, duplicateRecordCount)
AS
(
SELECT FirstName, LastName,
ROW_NUMBER() OVER (PARTITIONBY FirstName, LastName ORDERBY FirstName) AS duplicateRecordCount
FROM dbo.Users
)
DELETE
FROM TempUsers
WHERE duplicateRecordCount > 1
Related videos on Youtube
usr021986
Updated on October 22, 2020Comments
-
usr021986 over 3 years
Consider a column named
EmployeeName
tableEmployee
. The goal is to delete repeated records, based on theEmployeeName
field.EmployeeName ------------ Anand Anand Anil Dipak Anil Dipak Dipak Anil
Using one query, I want to delete the records which are repeated.
How can this be done with TSQL in SQL Server?
-
Sarfraz almost 14 yearsYou mean delete duplicate records, right?
-
DaeMoohn almost 14 yearsyou could select the distinct values and their related IDs and delete those records whose IDs aren't in the already selected list?
-
Andrew Bullock almost 14 yearsdo you have an unique ID column?
-
usr021986 almost 14 yearsNo I dont have the Unique ID column
-
armen over 10 yearshow did you accept the answer given by John Gibb, if table lacks of unique id? where is the
empId
column in your example used by John ? -
John Gibb over 10 yearsIf you don't have a unique ID column, or anything else meaningful to do an order by, you COULD also order by the employeename column... so your rn would be
row_number() over (partition by EmployeeName order by EmployeeName)
... this would pick an arbitrary single record for each name. -
Erik over 6 yearsPossible duplicate of How can I remove duplicate rows?
-
-
Brandon Horsley almost 14 yearsAlso, in Oracle, you could use "rowid" if there is no other unique id column.
-
Kyle B. almost 14 years+1 Even if there were not an ID column, one could be added as an identity field.
-
usr021986 almost 14 yearsI donot have the unique field(ID) in my Table. How can i perform the operation then.
-
MacGyver over 10 yearsSubPortal / a_horse_with_no_name - shouldn't this be selecting from an actual table? Also, ROW_NUMBER should be ROW_NUMBER() because it's a function, correct?
-
Arithmomaniac almost 8 yearsIf you don't have a primary key, you can use
ORDER BY (SELECT NULL)
stackoverflow.com/a/4812038 -
MiBol over 5 yearsExcellent answer. Sharp and effective. Even if the table doesn't have an ID; it's better to include one to execute this method.