Replicating rows in a pandas data frame by a column value

15,186

Solution 1

You can use Index.repeat to get repeated index values based on the column then select from the DataFrame:

df2 = df.loc[df.index.repeat(df.n)]

  id  n   v
0  A  1  10
1  B  2  13
1  B  2  13
2  C  3   8
2  C  3   8
2  C  3   8

Or you could use np.repeat to get the repeated indices and then use that to index into the frame:

df2 = df.loc[np.repeat(df.index.values, df.n)]

  id  n   v
0  A  1  10
1  B  2  13
1  B  2  13
2  C  3   8
2  C  3   8
2  C  3   8

After which there's only a bit of cleaning up to do:

df2 = df2.drop("n", axis=1).reset_index(drop=True)

  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

Note that if you might have duplicate indices to worry about, you could use .iloc instead:

df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)

  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

which uses the positions, and not the index labels.

Solution 2

You could use set_index and repeat

In [1057]: df.set_index(['id'])['v'].repeat(df['n']).reset_index()
Out[1057]:
  id   v
0  A  10
1  B  13
2  B  13
3  C   8
4  C   8
5  C   8

Details

In [1058]: df
Out[1058]:
  id  n   v
0  A  1  10
1  B  2  13
2  C  3   8
Share:
15,186
Admin
Author by

Admin

Updated on June 15, 2022

Comments

  • Admin
    Admin almost 2 years

    I want to replicate rows in a Pandas Dataframe. Each row should be repeated n times, where n is a field of each row.

    import pandas as pd
    
    what_i_have = pd.DataFrame(data={
      'id': ['A', 'B', 'C'],
      'n' : [  1,   2,   3],
      'v' : [ 10,  13,   8]
    })
    
    what_i_want = pd.DataFrame(data={
      'id': ['A', 'B', 'B', 'C', 'C', 'C'],
      'v' : [ 10,  13,  13,   8,   8,   8]
    })
    

    Is this possible?

  • Zero
    Zero over 6 years
    With newer version, can be df.loc[df.index.repeat(df.n)]
  • 576i
    576i over 2 years
    @zero: this should be the new accepted answer