Statistical outlier detection in MATLAB
Solution 1
If you want to find 2 standard deviations away from the mean on a per column basis I would use bsxfun
rather than repmat
like this:
meann = mean(main)
stdd = std(main)
I = bsxfun(@gt, abs(bsxfun(@minus, main, meann)), 2*stdd)
I would stop at I
as this will allow you to remove outliers. However you can call find
it you like:
out = find(I)
Although to me it is more intuitive to do this:
I = bsxfun(@lt, meann + 2*stdd, main) | bsxfun(@gt, meann - 2*stdd, main)
I think your repmat
solution is missing an abs
btw
Solution 2
A 2*sigma criterion is certainly simple, but the mean and the standard deviation are really sensitive to outliers. It follows that the out
variable will thus be influenced, and in fact your code doesn't find any outlier in the given matrix.
To detect the outliers you can simply compare the values appearing in your matrix against the median, or adopt more refined criteria. There is a beautiful lecture explaining this at https://www.mne.psu.edu/me345/Lectures/outliers.pdf
Solution 3
Use a cell array if you want to remove certain elements from different columns.
main = rand(100,4);
main(10,1) = 10000;
main(40,2) = 4321;
main([10,20,30],3)=[938;10;4];
mu = num2cell(mean(main));
sig = num2cell(std(main));
m = num2cell(main,1);
ind = cellfun(@(x,m,s) find( bsxfun(@lt, abs( bsxfun(@minus,x,m) ), 2*s) ),...
m, mu, sig, 'uni', 0);
data = cellfun(@(x,m,s) x( bsxfun(@lt, abs( bsxfun(@minus,x,m) ), 2*s) ),...
m, mu, sig, 'uni', 0);
ps. your example is too small in size so there might be not enough samples to establish a threshold.
Eghbal
Updated on June 04, 2022Comments
-
Eghbal almost 2 years
Suppose that we have this matrix :
main = [10000 5 3 1; 5 5677 0 134; 1 1 456 3];
This method the most widely used method in econometrics and statistical problems.
X
is our data that we're searching for outliers in it.X-mean(X)>= n*std(X)
So If this Inequality was true, That sample is outlier otherwise We will keep the sample.
Now my question. I want find outliers with these codes:
meann = mean(main); stdd = std(main); out = find(main-repmat(meann,size(main,1),1)>=repmat(2*stdd,size(main,1),1));
We are searching outliers in every column.
Out
should indicate index of outliers. In final step We should remove outliers in every column. Is any simpler function or method to do this in MAtLAB?Thanks.
-
Eghbal over 9 yearsThant true but using
X-mean(X)> 2(or 3,...)*std
is the most widely used method in econometrics and statistical problems. -
Yvon over 9 yearsThe lecture suggests using
|X-mean| > 1.9x * std
which is roughly 2. -
Dan over 9 years@user2991243 You're missing and absolute there, i.e. the
|·|
in Yvon's comment. It's very important! -
Eghbal over 9 yearsYes. That's true. Thank you for your helps.