Apply function on each column in a pandas dataframe
15,150
Solution 1
It seems to me that the iteration over the columns is unnecessary:
def calculate_df_columns_mean(self, df):
cleaned_data = self.remove_outliers(df[column].tolist())
return cleaned_data.mean()
the above should be enough assuming that remove_outliers
still returns a df
EDIT
I think the following should work:
def calculate_df_columns_mean(self, df):
return df.apply(lambda x: remove_outliers(x.tolist()).mean()
Solution 2
Use dataFrame.apply(func, axis=0)
:
# axis=0 means apply to columns; axis=1 to rows
df.apply(numpy.sum, axis=0) # equiv to df.sum(0)
Author by
Night Walker
Updated on June 17, 2022Comments
-
Night Walker almost 2 years
How I can write following function in more pandas way:
def calculate_df_columns_mean(self, df): means = {} for column in df.columns.columns.tolist(): cleaned_data = self.remove_outliers(df[column].tolist()) means[column] = np.mean(cleaned_data) return means
Thanks for help.
-
MaxU - stop genocide of UA over 7 yearswhat does `remove_outliers do?
-
EdChum over 7 yearsQuestion why iterate over the columns and then do this:
cleaned_data = self.remove_outliers(df[column].tolist())
? this seems like you're removing the outliers repeatedly for all columns for every column? -
Night Walker over 7 yearsI want to calculate the mean on clean data.
-
MaxU - stop genocide of UA over 7 yearsyou can do:
clean_df.mean()
-
EdChum over 7 yearsYou can calculate the mean for all the columns in one go and then remove the outliers in one go no? it seems to me that iterating over the columns is unnecessary here as you're removing the outliers on all columns and you can calculate the mean on the entire df
-
MaxU - stop genocide of UA over 7 yearsI agree with @EdChum - try to avoid
.apply()
method if possible, because it's pretty slow and ineffcient
-
-
Night Walker over 7 yearsremove_outliers gets list and returns clean list.
-
EdChum over 7 yearstry my new edit, it should work but really you should focus on modifying
remove_outliers
to operate on a np array and if possible aDataFrame
in a vectorised manner