Median of pandas dataframe column

python r numpy pandas

52,956

Solution 1

If you're looking for how to calculate the Median Absolute Deviation -

In [1]: df['dist'] = abs(df['count'] - df['count'].median())

In [2]: df
Out[2]:
   name  count  dist
0  aaaa   2000  1100
1  bbbb   1900  1000
2  cccc    900     0
3  dddd    500   400
4  eeee    100   800

In [3]: df['dist'].median()
Out[3]: 800.0

Solution 2

If you want to see the median, you can use df.describe(). The 50% value is the median.

Solution 3

Median absolute deviation,

for a column could also be calculated using statsmodels.robust.scale.mad, which can also be passed a normalization constant c which in this case is just 1.

>>> from statsmodels.robust.scale import mad
>>> mad(df['count'], c=1)
800.0

52,956

Author by

Ssank

Updated on December 31, 2020

Comments

Ssank over 3 years
I have a DataFrame df:
```
name   count    
aaaa   2000    
bbbb   1900    
cccc    900    
dddd    500    
eeee    100
```
I would like to look at the rows that are within a factor of 10 from the median of the count column.

I tried df['count'].median() and got the median. But don't know how to proceed further. Can you suggest how I could use pandas/numpy for this.

Expected Output :
```
name count distance from median

aaaa  2000   *****
```
I can use any measure as the distance from median (absolute deviation from median, quantiles etc.).
Prometheus about 5 years

can you also mention what does the 25% and 75% REALLY mean?
Prajeeth Emanuel over 4 years

@Prometheus that would be the first and third quartile of the list respectively.