Apply function on each column in a pandas dataframe

15,150

Solution 1

It seems to me that the iteration over the columns is unnecessary:

def calculate_df_columns_mean(self, df):
    cleaned_data = self.remove_outliers(df[column].tolist())
    return cleaned_data.mean()

the above should be enough assuming that remove_outliers still returns a df

EDIT

I think the following should work:

def calculate_df_columns_mean(self, df):
    return df.apply(lambda x: remove_outliers(x.tolist()).mean()

Solution 2

Use dataFrame.apply(func, axis=0):

# axis=0 means apply to columns; axis=1 to rows
df.apply(numpy.sum, axis=0) # equiv to df.sum(0)
Share:
15,150
Night Walker
Author by

Night Walker

Updated on June 17, 2022

Comments

  • Night Walker
    Night Walker almost 2 years

    How I can write following function in more pandas way:

         def calculate_df_columns_mean(self, df):
            means = {}
            for column in df.columns.columns.tolist():
                cleaned_data = self.remove_outliers(df[column].tolist())
                means[column] = np.mean(cleaned_data)
            return means
    

    Thanks for help.

    • MaxU - stop genocide of UA
      MaxU - stop genocide of UA over 7 years
      what does `remove_outliers do?
    • EdChum
      EdChum over 7 years
      Question why iterate over the columns and then do this: cleaned_data = self.remove_outliers(df[column].tolist())? this seems like you're removing the outliers repeatedly for all columns for every column?
    • Night Walker
      Night Walker over 7 years
      I want to calculate the mean on clean data.
    • MaxU - stop genocide of UA
      MaxU - stop genocide of UA over 7 years
      you can do: clean_df.mean()
    • EdChum
      EdChum over 7 years
      You can calculate the mean for all the columns in one go and then remove the outliers in one go no? it seems to me that iterating over the columns is unnecessary here as you're removing the outliers on all columns and you can calculate the mean on the entire df
    • MaxU - stop genocide of UA
      MaxU - stop genocide of UA over 7 years
      I agree with @EdChum - try to avoid .apply() method if possible, because it's pretty slow and ineffcient
  • Night Walker
    Night Walker over 7 years
    remove_outliers gets list and returns clean list.
  • EdChum
    EdChum over 7 years
    try my new edit, it should work but really you should focus on modifying remove_outliers to operate on a np array and if possible a DataFrame in a vectorised manner