Pandas: How to fill null values with mean of a groupby?
Solution 1
I think you can use groupby
and apply
fillna
with mean
. Then get NaN
if some category has only NaN
values, so use mean
of all values of column for filling NaN
:
df.value = df.groupby('category')['value'].apply(lambda x: x.fillna(x.mean()))
df.value = df.value.fillna(df.value.mean())
print (df)
id category value
0 1 A 6.25
1 2 B 1.00
2 3 A 10.50
3 4 C 4.15
4 5 A 2.00
5 6 B 1.00
Solution 2
You can also use GroupBy
+ transform
to fill NaN
values with groupwise means. This method avoids inefficient apply
+ lambda
. For example:
df['value'] = df['value'].fillna(df.groupby('category')['value'].transform('mean'))
df['value'] = df['value'].fillna(df['value'].mean())
Comments
-
sfactor almost 2 years
I have a dataset will some missing data that looks like this:
id category value 1 A NaN 2 B NaN 3 A 10.5 4 C NaN 5 A 2.0 6 B 1.0
I need to fill in the nulls to use the data in a model. Every time a category occurs for the first time it is NULL. The way I want to do is for cases like category
A
andB
that have more than one value replace the nulls with the average of that category. And for categoryC
with only single occurrence just fill in the average of the rest of the data.I know that I can simply do this for cases like
C
to get the average of all the rows but I'm stuck trying to do the categorywise means for A and B and replacing the nulls.df['value'] = df['value'].fillna(df['value'].mean())
I need the final df to be like this
id category value 1 A 6.25 2 B 1.0 3 A 10.5 4 C 4.15 5 A 2.0 6 B 1.0
-
mari over 5 years'Great help. any way how can I do this for many columns in pandas instead of a single column 'value'.
-
jezrael over 5 years@Mari - Use
df = df.groupby('category').apply(lambda x: x.fillna(x.mean())).reset_index(drop=True)
-
Umar.H almost 5 yearsthanks for this, was trying to speed up some of my ETL workflows and this worked a treat.