Median of pandas dataframe column
52,956
Solution 1
If you're looking for how to calculate the Median Absolute Deviation -
In [1]: df['dist'] = abs(df['count'] - df['count'].median())
In [2]: df
Out[2]:
name count dist
0 aaaa 2000 1100
1 bbbb 1900 1000
2 cccc 900 0
3 dddd 500 400
4 eeee 100 800
In [3]: df['dist'].median()
Out[3]: 800.0
Solution 2
If you want to see the median, you can use df.describe(). The 50% value is the median.
Solution 3
for a column could also be calculated using statsmodels.robust.scale.mad
, which can also be passed a normalization constant c
which in this case is just 1.
>>> from statsmodels.robust.scale import mad
>>> mad(df['count'], c=1)
800.0
Author by
Ssank
Updated on December 31, 2020Comments
-
Ssank over 3 years
I have a DataFrame
df
:name count aaaa 2000 bbbb 1900 cccc 900 dddd 500 eeee 100
I would like to look at the rows that are within a factor of 10 from the median of the
count
column.I tried
df['count'].median()
and got the median. But don't know how to proceed further. Can you suggest how I could use pandas/numpy for this.Expected Output :
name count distance from median aaaa 2000 *****
I can use any measure as the distance from median (absolute deviation from median, quantiles etc.).
-
Prometheus about 5 yearscan you also mention what does the 25% and 75% REALLY mean?
-
Prajeeth Emanuel over 4 years@Prometheus that would be the first and third quartile of the list respectively.