Delete duplicate records in SQL Server?

112,022

Solution 1

You can do this with window functions. It will order the dupes by empId, and delete all but the first one.

delete x from (
  select *, rn=row_number() over (partition by EmployeeName order by empId)
  from Employee 
) x
where rn > 1;

Run it as a select to see what would be deleted:

select *
from (
  select *, rn=row_number() over (partition by EmployeeName order by empId)
  from Employee 
) x
where rn > 1;

Solution 2

Assuming that your Employee table also has a unique column (ID in the example below), the following will work:

delete from Employee 
where ID not in
(
    select min(ID)
    from Employee 
    group by EmployeeName 
);

This will leave the version with the lowest ID in the table.

Edit
Re McGyver's comment - as of SQL 2012

MIN can be used with numeric, char, varchar, uniqueidentifier, or datetime columns, but not with bit columns

For 2008 R2 and earlier,

MIN can be used with numeric, char, varchar, or datetime columns, but not with bit columns (and it also doesn't work with GUID's)

For 2008R2 you'll need to cast the GUID to a type supported by MIN, e.g.

delete from GuidEmployees
where CAST(ID AS binary(16)) not in
(
    select min(CAST(ID AS binary(16)))
    from GuidEmployees
    group by EmployeeName 
);

SqlFiddle for various types in Sql 2008

SqlFiddle for various types in Sql 2012

Solution 3

You could try something like the following:

delete T1
from MyTable T1, MyTable T2
where T1.dupField = T2.dupField
and T1.uniqueField > T2.uniqueField  

(this assumes that you have an integer based unique field)

Personally though I'd say you were better off trying to correct the fact that duplicate entries are being added to the database before it occurs rather than as a post fix-it operation.

Solution 4

WITH CTE AS
(
   SELECT EmployeeName, 
          ROW_NUMBER() OVER(PARTITION BY EmployeeName ORDER BY EmployeeName) AS R
   FROM employee_table
)
DELETE CTE WHERE R > 1;

The magic of common table expressions.

Solution 5

DELETE
FROM MyTable
WHERE ID NOT IN (
     SELECT MAX(ID)
     FROM MyTable
     GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

WITH TempUsers (FirstName, LastName, duplicateRecordCount)
AS
(
    SELECT FirstName, LastName,
    ROW_NUMBER() OVER (PARTITIONBY FirstName, LastName ORDERBY FirstName) AS duplicateRecordCount
    FROM dbo.Users
)
DELETE
FROM TempUsers
WHERE duplicateRecordCount > 1
Share:
112,022

Related videos on Youtube

usr021986
Author by

usr021986

Updated on October 22, 2020

Comments

  • usr021986
    usr021986 over 3 years

    Consider a column named EmployeeName table Employee. The goal is to delete repeated records, based on the EmployeeName field.

    EmployeeName
    ------------
    Anand
    Anand
    Anil
    Dipak
    Anil
    Dipak
    Dipak
    Anil
    

    Using one query, I want to delete the records which are repeated.

    How can this be done with TSQL in SQL Server?

    • Sarfraz
      Sarfraz almost 14 years
      You mean delete duplicate records, right?
    • DaeMoohn
      DaeMoohn almost 14 years
      you could select the distinct values and their related IDs and delete those records whose IDs aren't in the already selected list?
    • Andrew Bullock
      Andrew Bullock almost 14 years
      do you have an unique ID column?
    • usr021986
      usr021986 almost 14 years
      No I dont have the Unique ID column
    • armen
      armen over 10 years
      how did you accept the answer given by John Gibb, if table lacks of unique id? where is the empId column in your example used by John ?
    • John Gibb
      John Gibb over 10 years
      If you don't have a unique ID column, or anything else meaningful to do an order by, you COULD also order by the employeename column... so your rn would be row_number() over (partition by EmployeeName order by EmployeeName)... this would pick an arbitrary single record for each name.
    • Erik
      Erik over 6 years
      Possible duplicate of How can I remove duplicate rows?
  • Brandon Horsley
    Brandon Horsley almost 14 years
    Also, in Oracle, you could use "rowid" if there is no other unique id column.
  • Kyle B.
    Kyle B. almost 14 years
    +1 Even if there were not an ID column, one could be added as an identity field.
  • usr021986
    usr021986 almost 14 years
    I donot have the unique field(ID) in my Table. How can i perform the operation then.
  • MacGyver
    MacGyver over 10 years
    SubPortal / a_horse_with_no_name - shouldn't this be selecting from an actual table? Also, ROW_NUMBER should be ROW_NUMBER() because it's a function, correct?
  • Arithmomaniac
    Arithmomaniac almost 8 years
    If you don't have a primary key, you can use ORDER BY (SELECT NULL) stackoverflow.com/a/4812038
  • MiBol
    MiBol over 5 years
    Excellent answer. Sharp and effective. Even if the table doesn't have an ID; it's better to include one to execute this method.