A query to delete duplicates with GROUP BY

16,443

Solution 1

If you're looking for a solution for MySQL then you can use a proper multi table DELETE syntax along with a JOIN like this

DELETE p
  FROM ps_specific_price p JOIN
(
  SELECT id_product, MAX(id_specific_price) id_specific_price
    FROM ps_specific_price
   GROUP BY id_product
) d 
   ON p.id_product = d.id_product
  AND p.id_specific_price <> d.id_specific_price;

Outcome:

| ID_SPECIFIC_PRICE | ID_PRODUCT |
|-------------------|------------|
|                 3 |          2 |
|                 7 |          3 |

Here is SQLFiddle demo

Solution 2

Try this:

CREATE TABLE ps_specific_price (
  id_specific_price NUMBER,
  id_product NUMBER
);

INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (1, 2);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (2, 2);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (3, 2);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (4, 3);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (5, 3);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (6, 3);
INSERT INTO ps_specific_price (id_specific_price, id_product) VALUES (7, 3);

COMMIT;

DELETE FROM ps_specific_price ps
  WHERE ps.id_specific_price NOT IN (
    SELECT MAX(id_specific_price)
      FROM ps_specific_price ps_in
    WHERE ps_in.id_product = ps.id_product
    );

SELECT * FROM ps_specific_price;

ID_SPECIFIC_PRICE      ID_PRODUCT             
---------------------- ---------------------- 
3                      2                      
7                      3                      

You must connect the table from the inner query with the table from the outer one.

I'm using Oracle 11g R2. I checked this on SQLFiddle and my DELETE statement is invalid for MySQL - don't have that one installed and not much experience there, but you didn't say what database you are using.

Share:
16,443
popkutt
Author by

popkutt

Updated on June 15, 2022

Comments

  • popkutt
    popkutt almost 2 years
    id_specific_price    id_product  
    -------------------------------
                1                2  
                2                2  
                3                2  
                4                3  
                5                3  
                6                3  
                7                3
    

    Need to delete the duplicates, expected outcome:

    id_specific_price    id_product  
    -------------------------------
                3                2  
                7                3
    

    SELECT * 
      FROM ps_specific_price 
     WHERE id_specific_price NOT IN 
     (SELECT MAX(id_specific_price) 
        FROM ps_specific_price 
       GROUP BY id_product) 
    

    works but

    DELETE FROM ps_specific_price 
     WHERE id_specific_price NOT IN 
    (SELECT MAX(id_specific_price) 
       FROM ps_specific_price 
      GROUP BY id_product)
    

    does not. There are plenty of examples to get around this but for some reason I am not able to adapt it. I believe it is GROUP BY. For example:

    DELETE FROM ps_specific_price 
     WHERE id_specific_price NOT IN
     (SELECT MAX(p.id_specific_price) 
        FROM (SELECT * FROM ps_specific_price ) as p)
       GROUP BY id_product
    

    Where did I go wrong here?

  • popkutt
    popkutt over 10 years
    Thank you. I forgot to mention that there are thousands of rows, these are just first 7. So I need to find the duplicates first and then delete them
  • popkutt
    popkutt over 10 years
    Thank you. It doesn't work on my MySQL database. What is Teradata database?
  • Przemyslaw Kruglej
    Przemyslaw Kruglej over 10 years
    Yes, but you are "looking" for them using the subquery, no matter how many there are. Please tell us what database are you using. Also, you should check this link: stackoverflow.com/questions/18932/…
  • kishore krv
    kishore krv over 10 years
    its a databases. If its on My Sql you could join id_product in the sub query with id_product from outer delete table.
  • wildplasser
    wildplasser over 10 years
    @popkutt: you did not mention mysql (which is very restricted in references to the table being updated or deleted)
  • kishore krv
    kishore krv over 10 years
    DELETE FROM ps_specific_price outT WHERE id_specific_price NOT IN (SELECT MAX(Subq.id_specific_price) FROM ps_specific_price subq WHERE outT.id_product=subq.id_product);Hope this clarifies.