Merge duplicate records into 1 records with the same table and table fields
Solution 1
You can do something like this in Postgres 9.1+:
WITH duplicates AS (
SELECT desired_unique_key, count(*) AS count_of_same_key, min(st.id) AS keep_id, max(st.id) as delete_id
FROM source_table st
GROUP BY desired_unique_key
HAVING count(*) > 1
),
deleted_dupes AS (
DELETE FROM source_table st
WHERE st.id IN (SELECT(delete_id) FROM duplicates)
)
UPDATE source_table st
SET field = WHATEVER
FROM duplicates d
WHERE st.id = d.keep_id
Solution 2
I'm doing something like this (postgres), I haven't tested it yet though.
SELECT dup.id AS dup_id, orig.id AS orig_id
INTO TEMP specialty_duplicates
FROM medical_specialty dup,
(SELECT DISTINCT ON (name) * FROM medical_specialty ORDER BY name, id) orig
WHERE orig.name = dup.name AND dup.id <> orig.id;
UPDATE doctor_medical_specialty
SET medical_specialty=orig_id
FROM specialty_duplicates
WHERE medical_specialty = dup_id;
DELETE
FROM medical_specialty
WHERE id IN (SELECT dup_id FROM specialty_duplicates);
ALTER TABLE medical_specialty
ADD UNIQUE (name);
The schema is that medical_specialty has id and name, and doctor_medical_specialty references it by id.
The benefit over a CTE (IIUC) is you can merge references in multiple referring tables.
I'm using a temporary table rather than a view so that both deleting and updating are consistent with the same snapshot in time
norlan V
Updated on June 07, 2022Comments
-
norlan V almost 2 years
I have a database table that contains a list of demographic records, some of those participant might have multiple/duplicate records, e.g.
NOTE:
Gender:
119 = Male
118 = FemaleRace:
255 = white
253 = AsianUrbanRural:
331 = Urban
332 = Ruralparticipantid, gender, race, urbanrural, moduletypeid, hibernateid, and more fields 1, 119, 0, 331, 1, 1, ..... 1, 119, 255, 0, 2, 2, ..... 1, 0, 255, 331, 3, 3, ..... 1, 119, 253, 331, 0, 4, .....
The output should be keep the first hibernateid and the duplicate records will be merge to the first hibernatedid record. If you can do this using function that will check the records if duplicate that would be great, after merged the records it delete the unused duplicate records. Your answer gives me a great idea to resolved this problem. Thanks
Output should be:
participantid, gender, race, urbanrural, moduletypeid, hibernateid, and more fields 1, 119, 255, 331, 1, 1, .....
Help me guys, Thanks