Labeling outliers on boxplot in R

38,367

Solution 1

In the example given it's a bit boring because they are all the same row. but here is the code:

bxpdat <- boxplot(vv)
text(bxpdat$group,                                              # the x locations 
     bxpdat$out,                                                # the y values
     rownames(vv)[which(vv == bxpdat$out, arr.ind=TRUE)[, 1]],  # the labels
     pos = 4)  

This picks the rownames that have values equal to the "out" list (i.e., the outliers) in the result of boxplot. Boxplot calls and returns the values from boxplot.stats. Take a look at:

 str(bxpdat)

Solution 2

There is a simple way. Note that b in Boxplot in following lines is a capital letter.

library(car)

Boxplot(y ~ x, id.method="y")

Solution 3

Or alternatively, you could use the "Boxplot" function from the {car} package which labels outliers for you.

See the following link: https://CRAN.R-project.org/package=car

Solution 4

@DWin's solution works very well for a single boxplot, but will fail for anything with duplicate values, like the dataset I have created:

#Create data
set.seed(1)
basenums <- c(1,2,3,4,8,15,30)
vv=matrix(c(basenums, sample(basenums), 1-basenums, 
          c(0, 29, 30, 31, 32, 33, 60)),nrow=7,ncol=4,byrow=F)
dimnames(vv)=list(c("one","two","three","four","five","six","seven"), 1:4)

On this dataset, @DWin's solution gives:

enter image description here

Which is false, because in the 4th example, it is not possible for the minimum and maximum to be in the same row.

This solution is monstrous (and I hope can be simplified), but effective.

#Reshape data
vv_dat <- as.data.frame(vv)
vv_dat$row <- row.names(vv_dat)
library(reshape2)
new_vv <- melt(vv_dat, id.vars="row")

#Get boxplot data
bxpdat <- as.data.frame(boxplot(value~variable, data=new_vv)[c("out", "group")])

#Get matches with boxplot data
text_guide <- do.call(rbind, apply(bxpdat, 1, 
    function(x) new_vv[new_vv$value==x[1]&new_vv$variable==x[2], ]))

#Add labels
with(text_guide, text(x=as.numeric(variable)+0.2, y=value, labels=row))

enter image description here

Solution 5

Or you can simply run the code from this blog post:

source("https://raw.githubusercontent.com/talgalili/R-code-snippets/master/boxplot.with.outlier.label.r") # Load the function
set.seed(6484)
y <- rnorm(20)
x1 <- sample(letters[1:2], 20,T)
lab_y <- sample(letters, 20)
# plot a boxplot with interactions:
boxplot.with.outlier.label(y~x1, lab_y)

(which handles multiple outliers which are close to one another)

enter image description here

Share:
38,367
user1836894
Author by

user1836894

Updated on January 22, 2020

Comments

  • user1836894
    user1836894 over 4 years

    I would like to plot each column of a matrix as a boxplot and then label the outliers in each boxplot as the row name they belong to in the matrix. To use an example:

    vv=matrix(c(1,2,3,4,8,15,30),nrow=7,ncol=4,byrow=F)
    rownames(vv)=c("one","two","three","four","five","six","seven")
    boxplot(vv)
    

    I would like to label the outlier in each plot (in this case 30) as the row name it belongs to, so in this case 30 belongs to row 7. Is there an easy way to do this? I have seen similar questions to this asked but none seemed to have worked the way I want it to.

  • sebastian-c
    sebastian-c about 11 years
    Won't this falsely flag outliers if there are two boxplots with different means if there's a data point which is an outlier in one and not the other?
  • IRTFM
    IRTFM about 11 years
    You're welcome to post an example that represents your concerns. I only see one boxplot in the posted question.
  • Tal Galili
    Tal Galili over 7 years