Find all duplicate rows in a pandas dataframe

12,356

First filter all duplicated rows and then groupby with apply or convert index to_series:

df = df[df.col.duplicated(keep=False)]

a = df.groupby('col').apply(lambda x: list(x.index))
print (a)
col
1    [1, 3, 4]
2       [2, 5]
dtype: object

a = df.index.to_series().groupby(df.col).apply(list)
print (a)
col
1    [1, 3, 4]
2       [2, 5]
dtype: object

And if need nested lists:

L = df.groupby('col').apply(lambda x: list(x.index)).tolist()
print (L)
[[1, 3, 4], [2, 5]]

If need use only first column is possible selected by position with iloc:

a = df[df.iloc[:,0].duplicated(keep=False)]
      .groupby(df.iloc[:,0]).apply(lambda x: list(x.index))
print (a)
col
1    [1, 3, 4]
2       [2, 5]
dtype: object
Share:
12,356
Nico
Author by

Nico

Updated on June 07, 2022

Comments

  • Nico
    Nico almost 2 years

    I would like to be able to get the indices of all the instances of a duplicated row in a dataset without knowing the name and number of columns beforehand. So assume I have this:

         col
    1  |  1
    2  |  2
    3  |  1
    4  |  1
    5  |  2
    

    I'd like to be able to get [1, 3, 4] and [2, 5]. Is there any way to achieve this? It sounds really simple, but since I don't know the columns beforehand I can't do something like df[col == x...].

  • Nico
    Nico about 7 years
    Okay that's good, except that since I don't know the columns I need to groupby df.columns, but that's fine. I don't know how I didn't think of groupby by myself.
  • jezrael
    jezrael about 7 years
    I add solution for select by position.
  • Nabin
    Nabin about 6 years
    Can this find duplicate rows with multiple columns too? I mean I see only col in the example not col1, col2, col3, and so on.
  • jezrael
    jezrael about 6 years
    @nabin For check dupes in multiple columns use df = df[df.duplicated(subset=['col','col1','col2'], keep=False)], if want check dupes by all columns df = df[df.duplicated(keep=False)]
  • Sarah Lissachell
    Sarah Lissachell over 3 years
    This is just what I needed