Pandas dataframe groupby to calculate population standard deviation
33,856
Solution 1
You can pass additional args to np.std
in the agg
function:
In [202]:
df.groupby('A').agg(np.std, ddof=0)
Out[202]:
B values
A
1 0.5 2.5
2 0.5 2.5
In [203]:
df.groupby('A').agg(np.std, ddof=1)
Out[203]:
B values
A
1 0.707107 3.535534
2 0.707107 3.535534
Solution 2
For degree of freedom = 0
(This means that bins with one number will end up with std=0
instead of NaN
)
import numpy as np
def std(x):
return np.std(x)
df.groupby('A').agg(['mean', 'max', std])
Author by
neelshiv
Updated on May 02, 2020Comments
-
neelshiv about 4 years
I am trying to use groupby and np.std to calculate a standard deviation, but it seems to be calculating a sample standard deviation (with a degrees of freedom equal to 1).
Here is a sample.
#create dataframe >>> df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)}) >>> df A B values 0 1 1 10 1 1 2 15 2 2 1 20 3 2 2 25 #calculate standard deviation using groupby >>> df.groupby('A').agg(np.std) B values A 1 0.707107 3.535534 2 0.707107 3.535534 #Calculate using numpy (np.std) >>> np.std([10,15],ddof=0) 2.5 >>> np.std([10,15],ddof=1) 3.5355339059327378
Is there a way to use the population std calculation (ddof=0) with the groupby statement? The records I am using are not (not the example table above) are not samples, so I am only interested in population std deviations.
-
neelshiv over 9 yearsThank you! I had tried "df.groupby('A').agg(np.std(ddof=0))", but I did not try adding the ddof in the agg parenthesis. I'll mark your reply as the answer once I can in 8 minutes (you responded really quickly).