How do I perform a math operation on a Python Pandas dataframe column, but only if a certain condition is met?

28,051

Solution 1

You can use mask:

df.credit_score = df.credit_score.mask( df.credit_score > 800, df.credit_score/ 100)

Or numpy.where:

df.credit_score = np.where( df.credit_score > 800, df.credit_score/ 100, df.credit_score)

print (df)
   id  credit_score    col   col1
0   0           750  750.0  750.0
1   1           653  653.0  653.0
2   2           741  741.0  741.0
3   3         65100  651.0  651.0
4   4         73500  735.0  735.0
5   5           565  565.0  565.0
6   6           480  480.0  480.0
7   7         78900  789.0  789.0
8   8           699  699.0  699.0
9   9         71500  715.0  715.0

Solution 2

You can use Series.apply. It accepts a function and applies it on every element in the series. Note that it is not inplace and you will in need to reassign the series that it returns, either to a new column or to the same column.

def fix_scores(score):
    return score / 100 if score > 800 else score
    # same as
    # if score > 800:
    #      return score / 100
    # return score

df['credit_score_fixed'] = df['credit_score'].apply(fix_scores)

Solution 3

I'd use Pandas boolean indexing:

In [193]: df.loc[df.credit_score > 800, 'credit_score'] /= 100

In [194]: df
Out[194]:
    credit_score
id
0          750.0
1          653.0
2          741.0
3          651.0
4          735.0
5          565.0
6          480.0
7          789.0
8          699.0
9          715.0
Share:
28,051

Related videos on Youtube

ScottP
Author by

ScottP

Updated on January 09, 2020

Comments

  • ScottP
    ScottP over 4 years

    I have a Pandas dataframe that I'm working with and I simply need to divide all values in a certain column that are greater than 800 by 100. In other words, if the value in the 'credit_score' column is greater than 800, it can be assumed that the data were entered with two extra places to the left of the decimal place. For example...

    id    credit_score    column_b    column_c
    0     750             ...         ...
    1     653             ...         ...
    2     741             ...         ...
    3     65100           ...         ...
    4     73500           ...         ...
    5     565             ...         ...
    6     480             ...         ...
    7     78900           ...         ...
    8     699             ...         ...
    9     71500           ...         ...
    

    So I basically want to divide the credit scores for row indexes 3, 4, 7, and 9 by 100, but not the others. I want the new, valid values to replace the old, invalid ones. Alternatively, a new column such as 'credit_score_fixed' would work too. I'm fairly new to Python and Pandas, so any help is much appreciated.

  • ScottP
    ScottP over 7 years
    This worked perfectly, thank you! Could you clarify for me what the "/=" does though (as opposed to using just "/")?
  • ScottP
    ScottP over 7 years
    For me, "mask" is the most intuitive solution and worked perfectly. Thank you!
  • MaxU - stop genocide of UA
    MaxU - stop genocide of UA over 7 years
    @ScottP, a /= 2 is equal to a = a / 2
  • ScottP
    ScottP over 7 years
    Worked like a charm. Thank you!