Python DataFrames For Loop with If Statement not working
I think better is use numpy.where
:
mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')
Sample:
ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09]})
print (ES_15M_Summary)
Rolling_OLS_Coefficient
0 0.07
1 0.01
2 0.09
mask = ES_15M_Summary['Rolling_OLS_Coefficient'] > .08
ES_15M_Summary['Long'] = np.where(mask, 'Y', 'N')
print (ES_15M_Summary)
Rolling_OLS_Coefficient Long
0 0.07 N
1 0.01 N
2 0.09 Y
Looping, very slow solution:
for index, row in ES_15M_Summary.iterrows():
if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
ES_15M_Summary.loc[index,'Long'] = 'Y'
else:
ES_15M_Summary.loc[index,'Long'] = 'N'
print (ES_15M_Summary)
Rolling_OLS_Coefficient Long
0 0.07 N
1 0.01 N
2 0.09 Y
Timings:
#3000 rows
ES_15M_Summary = pd.DataFrame({'Rolling_OLS_Coefficient':[0.07,0.01,0.09] * 1000})
#print (ES_15M_Summary)
def loop(df):
for index, row in ES_15M_Summary.iterrows():
if ES_15M_Summary.loc[index, 'Rolling_OLS_Coefficient'] > .08:
ES_15M_Summary.loc[index,'Long'] = 'Y'
else:
ES_15M_Summary.loc[index,'Long'] = 'N'
return (ES_15M_Summary)
print (loop(ES_15M_Summary))
In [51]: %timeit (loop(ES_15M_Summary))
1 loop, best of 3: 2.38 s per loop
In [52]: %timeit ES_15M_Summary['Long'] = np.where(ES_15M_Summary['Rolling_OLS_Coefficient'] > .08, 'Y', 'N')
1000 loops, best of 3: 555 µs per loop
Cole Starbuck
Updated on December 02, 2022Comments
-
Cole Starbuck over 1 year
I have a DataFrame called ES_15M_Summary, with coefficients/betas in on column titled ES_15M_Summary['Rolling_OLS_Coefficient'] as follows:
If the above pictured column ('Rolling_OLS_Coefficient') is a value greater than .08, I want a new column titled 'Long' to be a binary 'Y'. If the value in the other column is less than .08, I want that value to be 'NaN' or just 'N' (either works).
So I'm writing a for loop to run down the columns. First, I created a new column titled 'Long' and set it to NaN:
ES_15M_Summary['Long'] = np.nan
Then I made the following For Loop:
for index, row in ES_15M_Summary.iterrows(): if ES_15M_Summary['Rolling_OLS_Coefficient'] > .08: ES_15M_Summary['Long'] = 'Y' else: ES_15M_Summary['Long'] = 'NaN'
I get the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
...referring to the if statement line shown above (if...>.08:). I'm not sure why I'm getting this error or what's wrong with the for loop. Any help is appreciated.
-
Cole Starbuck about 7 yearsThank You, I'm using the for loop you provided. Much appreciated.