SQL: How to find duplicates based on two fields?

76,408

Solution 1

SELECT  *
FROM    (
        SELECT  t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
        FROM    mytable t
        )
WHERE   rn > 1

Solution 2

SELECT entity_id, station_id, obs_year
FROM mytable t1
WHERE EXISTS (SELECT 1 from mytable t2 Where
       t1.station_id = t2.station_id
       AND t1.obs_year = t2.obs_year
       AND t1.RowId <> t2.RowId)

Solution 3

Change the 3 fields in the initial select to be

SELECT
t1.entity_id, t1.station_id, t1.obs_year

Solution 4

Re-write of your query

SELECT
t1.entity_id, t1.station_id, t1.obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable 
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year

I think the ambiguous column error (ORA-00918) was because you were selecting columns whose names appeared in both the table and the subquery, but you did not specifiy if you wanted it from dupes or from mytable (aliased as t1).

Solution 5

Could you not create a new table that includes the unique constraint, and then copy across the data row by row, ignoring failures?

Share:
76,408

Related videos on Youtube

James Adams
Author by

James Adams

Updated on July 09, 2022

Comments

  • James Adams
    James Adams almost 2 years

    I have rows in an Oracle database table which should be unique for a combination of two fields but the unique constrain is not set up on the table so I need to find all rows which violate the constraint myself using SQL. Unfortunately my meager SQL skills aren't up to the task.

    My table has three columns which are relevant: entity_id, station_id, and obs_year. For each row the combination of station_id and obs_year should be unique, and I want to find out if there are rows which violate this by flushing them out with an SQL query.

    I have tried the following SQL (suggested by this previous question) but it doesn't work for me (I get ORA-00918 column ambiguously defined):

    SELECT
    entity_id, station_id, obs_year
    FROM
    mytable t1
    INNER JOIN (
    SELECT entity_id, station_id, obs_year FROM mytable 
    GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
    ON 
    t1.station_id = dupes.station_id AND
    t1.obs_year = dupes.obs_year
    

    Can someone suggest what I'm doing wrong, and/or how to solve this?

  • James Adams
    James Adams over 13 years
    Yes, this is a good idea, thanks! BTW I'm trying to figure out how to create the constraint on my table using annotations in my entity class (I'm a Java developer using JPA/Hibernate), see stackoverflow.com/questions/3504477/…
  • James Adams
    James Adams over 13 years
    Thanks a lot for this response. Unfortunately when I run this I get an "ORA-00923: FROM keyword not found where expected" message.
  • James Adams
    James Adams over 13 years
    Thanks, Mark, for the tip about not using entity_id in the grouping subquery, and for the illustrative example.
  • Taryn East
    Taryn East over 9 years
    Hiya, this may well solve the problem... but it'd be good if you could provide a little explanation about how and why it works :) Don't forget - there are heaps of newbies on Stack overflow, and they could learn a thing or two from your expertise - what's obvious to you might not be so to them.
  • grokster
    grokster over 9 years
    Thanks Taryn. It works by using GROUP BY to find any rows that match any other rows based on the specified Columns. The HAVING COUNT(*) > 1 says that we are only interested in seeing any rows that occur more than 1 time (and are therefore duplicates)
  • Taryn East
    Taryn East over 9 years
    Hi, don't tell me (in the comments). I know SQL, I'm not asking for me... This sort of explanation is "part of your complete answer"... so please edit your answer and add it there. :)
  • Thyag
    Thyag over 7 years
    Looks like we cannot do this on a view: ORA-01445: cannot select ROWID from, or sample, a join view without a key-preserved table
  • Mafii
    Mafii about 7 years
    In mssql in had to put a as x (name doesn't really matter) behind the FROM ( ) paranthesis to make it work. Great answer!
  • Daniel F
    Daniel F about 5 years
    Isn't this only part of the answer? I mean, you get to know which entity_id, station_id, obs_year tuples have duplicates, but you don't get the actual rows which are duplicated.