Append new row when using pandas iterrows()?
It is generally inefficient to append rows to a dataframe in a loop because a new copy is returned. You are better off storing the intermediate results in a list and then concatenating everything together at the end.
Using row.loc['var1'] = row['var1'] - 30
will make an inplace change to the original dataframe.
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 2) * 100, columns=['var1', 'var2'])
>>> df
var1 var2
0 176.405235 40.015721
1 97.873798 224.089320
2 186.755799 -97.727788
3 95.008842 -15.135721
4 -10.321885 41.059850
new_rows = []
for i, row in df.iterrows():
while row['var1'] > 30:
newrow = row
newrow['var2'] = 30
row.loc['var1'] = row['var1'] - 30
new_rows.append(newrow.values)
df_new = df.append(pd.DataFrame(new_rows, columns=df.columns)).reset_index()
>>> df
var1 var2
0 26.405235 30.00000
1 7.873798 30.00000
2 6.755799 30.00000
3 5.008842 30.00000
4 -10.321885 41.05985
>>> df_new
var1 var2
0 26.405235 30.00000
1 7.873798 30.00000
2 6.755799 30.00000
3 5.008842 30.00000
4 -10.321885 41.05985
5 26.405235 30.00000
6 26.405235 30.00000
7 26.405235 30.00000
8 26.405235 30.00000
9 26.405235 30.00000
10 7.873798 30.00000
11 7.873798 30.00000
12 7.873798 30.00000
13 6.755799 30.00000
14 6.755799 30.00000
15 6.755799 30.00000
16 6.755799 30.00000
17 6.755799 30.00000
18 6.755799 30.00000
19 5.008842 30.00000
20 5.008842 30.00000
21 5.008842 30.00000
EDIT (per request below):
new_rows = []
for i, row in df.iterrows():
while row['var1'] > 30:
row.loc['var1'] = var1 = row['var1'] - 30
new_rows.append([var1, 30])
df_new = df.append(pd.DataFrame(new_rows, columns=df.columns)).reset_index()
>>> df_new
index var1 var2
0 0 26.405235 40.015721
1 1 7.873798 224.089320
2 2 6.755799 -97.727788
3 3 5.008842 -15.135721
4 4 -10.321885 41.059850
5 0 146.405235 30.000000
6 1 116.405235 30.000000
7 2 86.405235 30.000000
8 3 56.405235 30.000000
9 4 26.405235 30.000000
10 5 67.873798 30.000000
11 6 37.873798 30.000000
12 7 7.873798 30.000000
13 8 156.755799 30.000000
14 9 126.755799 30.000000
15 10 96.755799 30.000000
16 11 66.755799 30.000000
17 12 36.755799 30.000000
18 13 6.755799 30.000000
19 14 65.008842 30.000000
20 15 35.008842 30.000000
21 16 5.008842 30.000000
jam
Updated on June 22, 2022Comments
-
jam almost 2 years
I have the following code where I create
df['var'2]
and alterdf['var1']
. After performing these changes, I would like to append thenewrow
(withdf['var'2]
) to the dataframe while keeping the original (though now altered) row (which hasdf['var1']
).for i, row in df.iterrows(): while row['var1'] > 30: newrow = row newrow['var2'] = 30 row['var1'] = row['var1']-30 df.append(newrow)
I understand that when using
iterrows()
, row variables are copies instead of views which is why the changes are not being updated in the original dataframe. So, how would I alter this code to actually append newrow to the dataframe?Thank you!
-
jam about 8 yearsHi, @Alexander. Is there a way for me to keep the intermediate values of var1? So, in your example, var1=176, and the df_new has var1=26 six times. How do I get row0: var1=146 var2=30, row2: var1=116 var2=30, etc. Do you know?
-
jam about 8 yearsMoving new_rows.append to the line before row.loc['var1']... seemed to have no effect on the output.