SQL: How to find duplicates based on two fields?
Solution 1
SELECT *
FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY station_id, obs_year ORDER BY entity_id) AS rn
FROM mytable t
)
WHERE rn > 1
Solution 2
SELECT entity_id, station_id, obs_year
FROM mytable t1
WHERE EXISTS (SELECT 1 from mytable t2 Where
t1.station_id = t2.station_id
AND t1.obs_year = t2.obs_year
AND t1.RowId <> t2.RowId)
Solution 3
Change the 3 fields in the initial select to be
SELECT
t1.entity_id, t1.station_id, t1.obs_year
Solution 4
Re-write of your query
SELECT
t1.entity_id, t1.station_id, t1.obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes
ON
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year
I think the ambiguous column error (ORA-00918) was because you were select
ing columns whose names appeared in both the table and the subquery, but you did not specifiy if you wanted it from dupes
or from mytable
(aliased as t1
).
Solution 5
Could you not create a new table that includes the unique constraint, and then copy across the data row by row, ignoring failures?
Related videos on Youtube
James Adams
Updated on July 09, 2022Comments
-
James Adams almost 2 years
I have rows in an Oracle database table which should be unique for a combination of two fields but the unique constrain is not set up on the table so I need to find all rows which violate the constraint myself using SQL. Unfortunately my meager SQL skills aren't up to the task.
My table has three columns which are relevant: entity_id, station_id, and obs_year. For each row the combination of station_id and obs_year should be unique, and I want to find out if there are rows which violate this by flushing them out with an SQL query.
I have tried the following SQL (suggested by this previous question) but it doesn't work for me (I get ORA-00918 column ambiguously defined):
SELECT entity_id, station_id, obs_year FROM mytable t1 INNER JOIN ( SELECT entity_id, station_id, obs_year FROM mytable GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes ON t1.station_id = dupes.station_id AND t1.obs_year = dupes.obs_year
Can someone suggest what I'm doing wrong, and/or how to solve this?
-
James Adams over 13 yearsYes, this is a good idea, thanks! BTW I'm trying to figure out how to create the constraint on my table using annotations in my entity class (I'm a Java developer using JPA/Hibernate), see stackoverflow.com/questions/3504477/…
-
James Adams over 13 yearsThanks a lot for this response. Unfortunately when I run this I get an "ORA-00923: FROM keyword not found where expected" message.
-
James Adams over 13 yearsThanks, Mark, for the tip about not using entity_id in the grouping subquery, and for the illustrative example.
-
Taryn East over 9 yearsHiya, this may well solve the problem... but it'd be good if you could provide a little explanation about how and why it works :) Don't forget - there are heaps of newbies on Stack overflow, and they could learn a thing or two from your expertise - what's obvious to you might not be so to them.
-
grokster over 9 yearsThanks Taryn. It works by using GROUP BY to find any rows that match any other rows based on the specified Columns. The HAVING COUNT(*) > 1 says that we are only interested in seeing any rows that occur more than 1 time (and are therefore duplicates)
-
Taryn East over 9 yearsHi, don't tell me (in the comments). I know SQL, I'm not asking for me... This sort of explanation is "part of your complete answer"... so please edit your answer and add it there. :)
-
Thyag over 7 yearsLooks like we cannot do this on a view: ORA-01445: cannot select ROWID from, or sample, a join view without a key-preserved table
-
Mafii about 7 yearsIn mssql in had to put a
as x
(name doesn't really matter) behind the FROM ( ) paranthesis to make it work. Great answer! -
Daniel F about 5 yearsIsn't this only part of the answer? I mean, you get to know which
entity_id, station_id, obs_year
tuples have duplicates, but you don't get the actual rows which are duplicated.