Median of pandas dataframe column

52,956

Solution 1

If you're looking for how to calculate the Median Absolute Deviation -

In [1]: df['dist'] = abs(df['count'] - df['count'].median())

In [2]: df
Out[2]:
   name  count  dist
0  aaaa   2000  1100
1  bbbb   1900  1000
2  cccc    900     0
3  dddd    500   400
4  eeee    100   800

In [3]: df['dist'].median()
Out[3]: 800.0

Solution 2

If you want to see the median, you can use df.describe(). The 50% value is the median.

Solution 3

Median absolute deviation,

                                               enter image description here

for a column could also be calculated using statsmodels.robust.scale.mad, which can also be passed a normalization constant c which in this case is just 1.

>>> from statsmodels.robust.scale import mad
>>> mad(df['count'], c=1)
800.0
Share:
52,956
Ssank
Author by

Ssank

Updated on December 31, 2020

Comments

  • Ssank
    Ssank over 3 years

    I have a DataFrame df:

    name   count    
    aaaa   2000    
    bbbb   1900    
    cccc    900    
    dddd    500    
    eeee    100
    

    I would like to look at the rows that are within a factor of 10 from the median of the count column.

    I tried df['count'].median() and got the median. But don't know how to proceed further. Can you suggest how I could use pandas/numpy for this.

    Expected Output :

    name count distance from median
    
    aaaa  2000   *****
    

    I can use any measure as the distance from median (absolute deviation from median, quantiles etc.).

  • Prometheus
    Prometheus about 5 years
    can you also mention what does the 25% and 75% REALLY mean?
  • Prajeeth Emanuel
    Prajeeth Emanuel over 4 years
    @Prometheus that would be the first and third quartile of the list respectively.