Removing outliers in R

r statistics standard-deviation

11,701

Solution 1

There's probably lots of ways and probably add on packages to deal with this. I'd suggest you try this first:

library(sos); findFn("outlier")

Here's a way you could do what your asking for using the scale function that can standardize vectors.

#create a data set with outliers
set.seed(10)
dat <- data.frame(sapply(seq_len(5), function(i) 
    sample(c(1:50, 100:101), 200, replace=TRUE)))

#standardize each column (we use it in the outdet function)
scale(dat)

#create function that looks for values > +/- 2 sd from mean
outdet <- function(x) abs(scale(x)) >= 2
#index with the function to remove those values
dat[!apply(sapply(dat, outdet), 1, any), ]

So in answering your question yes there is an easy way in that the code to do this could be boiled down to 1 line of code:

dat[!apply(sapply(dat, function(x) abs(scale(x)) >= 2), 1, any), ]

And I'm guessing there's a package that may do this and more. The sos package is terrific (IMHO) for finding functions to do what you want.

Solution 2

na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}

11,701

Author by

ThePerson

Updated on June 04, 2022

Comments

ThePerson almost 2 years

I have looked at a set of data and decided it would be good to remove outliers, with an outlier having the definition of being 2SD away from the mean.

If I have a set of data, say 500 rows with 15 different attributes, how can I remove all the rows which have 1 or more attribute which is 2 standard deviations away from the mean?

Is there an easy way to do this using R? Thanks,