How do I perform a math operation on a Python Pandas dataframe column, but only if a certain condition is met?
Solution 1
You can use mask
:
df.credit_score = df.credit_score.mask( df.credit_score > 800, df.credit_score/ 100)
Or numpy.where
:
df.credit_score = np.where( df.credit_score > 800, df.credit_score/ 100, df.credit_score)
print (df)
id credit_score col col1
0 0 750 750.0 750.0
1 1 653 653.0 653.0
2 2 741 741.0 741.0
3 3 65100 651.0 651.0
4 4 73500 735.0 735.0
5 5 565 565.0 565.0
6 6 480 480.0 480.0
7 7 78900 789.0 789.0
8 8 699 699.0 699.0
9 9 71500 715.0 715.0
Solution 2
You can use Series.apply
. It accepts a function and applies it on every element in the series. Note that it is not inplace and you will in need to reassign the series that it returns, either to a new column or to the same column.
def fix_scores(score):
return score / 100 if score > 800 else score
# same as
# if score > 800:
# return score / 100
# return score
df['credit_score_fixed'] = df['credit_score'].apply(fix_scores)
Solution 3
I'd use Pandas boolean indexing:
In [193]: df.loc[df.credit_score > 800, 'credit_score'] /= 100
In [194]: df
Out[194]:
credit_score
id
0 750.0
1 653.0
2 741.0
3 651.0
4 735.0
5 565.0
6 480.0
7 789.0
8 699.0
9 715.0
Related videos on Youtube
ScottP
Updated on January 09, 2020Comments
-
ScottP over 4 years
I have a Pandas dataframe that I'm working with and I simply need to divide all values in a certain column that are greater than 800 by 100. In other words, if the value in the 'credit_score' column is greater than 800, it can be assumed that the data were entered with two extra places to the left of the decimal place. For example...
id credit_score column_b column_c 0 750 ... ... 1 653 ... ... 2 741 ... ... 3 65100 ... ... 4 73500 ... ... 5 565 ... ... 6 480 ... ... 7 78900 ... ... 8 699 ... ... 9 71500 ... ...
So I basically want to divide the credit scores for row indexes 3, 4, 7, and 9 by 100, but not the others. I want the new, valid values to replace the old, invalid ones. Alternatively, a new column such as 'credit_score_fixed' would work too. I'm fairly new to Python and Pandas, so any help is much appreciated.
-
ScottP over 7 yearsThis worked perfectly, thank you! Could you clarify for me what the "/=" does though (as opposed to using just "/")?
-
ScottP over 7 yearsFor me, "mask" is the most intuitive solution and worked perfectly. Thank you!
-
MaxU - stop genocide of UA over 7 years@ScottP,
a /= 2
is equal toa = a / 2
-
ScottP over 7 yearsWorked like a charm. Thank you!