Conditionally fill column values based on another columns value in pandas

python python-3.x pandas dataframe numpy

98,975

Solution 1

You probably want to do

df['Normalized'] = np.where(df['Currency'] == '$', df['Budget'] * 0.78125, df['Budget'])

Solution 2

Similar results via an alternate style might be to write a function that performs the operation you want on a row, using row['fieldname'] syntax to access individual values/columns, and then perform a DataFrame.apply method upon it

This echoes the answer to the question linked here: pandas create new column based on values from other columns

def normalise_row(row):
    if row['Currency'] == '$'
    ...
    ...
    ...
    return result

df['Normalized'] = df.apply(lambda row : normalise_row(row), axis=1)

Solution 3

An option that doesn't require an additional import for numpy:

df['Normalized'] = df['Budget'].where(df['Currency']=='$', df['Budget'] * 0.78125)

Solution 4

Taking Tom Kimber's suggestion one step further, you could use a Function Dictionary to set various conditions for your functions. This solution is expanding the scope of the question.

I'm using an example from a personal application.

# write the dictionary

def applyCalculateSpend (df_name, cost_method_col, metric_col, rate_col, total_planned_col):
    calculations = {
            'CPMV'  : df_name[metric_col] / 1000 * df_name[rate_col],
            'Free'  : 0
            }
    df_method = df_name[cost_method_col]
    return calculations.get(df_method, "not in dict")

# call the function inside a lambda

test_df['spend'] = test_df.apply(lambda row: applyCalculateSpend(
row,
cost_method_col='cost method',
metric_col='metric',
rate_col='rate',
total_planned_col='total planned'), axis = 1)

  cost method  metric  rate  total planned  spend
0        CPMV    2000   100           1000  200.0
1        CPMV    4000   100           1000  400.0
4        Free       1     2              3    0.0

View more solutions

98,975

Author by

Jan Willem Tulp

Data visualization freelancer

Updated on July 31, 2021

Comments

Jan Willem Tulp almost 3 years

I have a DataFrame with a few columns. One columns contains a symbol for which currency is being used, for instance a euro or a dollar sign. Another column contains a budget value. So for instance in one row it could mean a budget of 5000 in euro and in the next row it could say a budget of 2000 in dollar.

In pandas I would like to add an extra column to my DataFrame, normalizing the budgets in euro. So basically, for each row the value in the new column should be the value from the budget column * 1 if the symbol in the currency column is a euro sign, and the value in the new column should be the value of the budget column * 0.78125 if the symbol in the currency column is a dollar sign.

I know how to add a column, fill it with values, copy values from another column etc. but not how to fill the new column conditionally based on the value of another column.

Any suggestions?