Extract outliers from Seaborn Boxplot

10,111

Solution 1

Seaborn uses matplotlib to handle outlier calculations, meaning the key parameter, whis, is passed onto ax.boxplot. The specific function taking care of the calculation is documented here: https://matplotlib.org/api/cbook_api.html#matplotlib.cbook.boxplot_stats. You can use matplotlib.cbook.boxplot_stats to calculate rather than extract outliers. The follow code snippet shows you the calculation and how it is the same as the seaborn plot:

import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
import pandas as pd
import seaborn as sns

data = [
    ('LA', 1),
    ('Sultan', 128),
    ('ElderCare', 1),
    ('CA', 3),
    ('More', 900),
]
df = pd.DataFrame(data, columns=('client', 'total'))
ax = sns.boxplot(data=df)
outliers = [y for stat in boxplot_stats(df['total']) for y in stat['fliers']]
print(outliers)
for y in outliers:
    ax.plot(1, y, 'p')
ax.set_xlim(right=1.5)
plt.show()

enter image description here

Solution 2

The code below will give you an array of outliers use it to extract values from the dataframe.

from matplotlib.cbook import boxplot_stats  
boxplot_stats(df.colname).pop(0)['fliers']
Share:
10,111
Aaron
Author by

Aaron

Updated on June 29, 2022

Comments

  • Aaron
    Aaron almost 2 years

    Is there a way to extract all outliers after plotting a Seaborn Boxplot? For example, if I am plotting a boxplot for the below data

          client                total
    1      LA                     1
    2      Sultan                128
    3      ElderCare              1
    4      CA                     3
    5      More                  900
    

    I want to see the below records returned as outliers after the boxplot is plotted.

    2      Sultan                128
    5      More                  900