Statistical outlier detection in MATLAB

11,207

Solution 1

If you want to find 2 standard deviations away from the mean on a per column basis I would use bsxfun rather than repmat like this:

meann = mean(main)
stdd = std(main)

I = bsxfun(@gt, abs(bsxfun(@minus, main, meann)), 2*stdd)

I would stop at I as this will allow you to remove outliers. However you can call find it you like:

out = find(I)

Although to me it is more intuitive to do this:

I = bsxfun(@lt, meann + 2*stdd, main) | bsxfun(@gt, meann - 2*stdd, main)

I think your repmat solution is missing an abs btw

Solution 2

A 2*sigma criterion is certainly simple, but the mean and the standard deviation are really sensitive to outliers. It follows that the out variable will thus be influenced, and in fact your code doesn't find any outlier in the given matrix.

To detect the outliers you can simply compare the values appearing in your matrix against the median, or adopt more refined criteria. There is a beautiful lecture explaining this at https://www.mne.psu.edu/me345/Lectures/outliers.pdf

Solution 3

Use a cell array if you want to remove certain elements from different columns.

main = rand(100,4);
main(10,1) = 10000;
main(40,2) = 4321;
main([10,20,30],3)=[938;10;4];


mu = num2cell(mean(main));
sig = num2cell(std(main));

m = num2cell(main,1);
ind = cellfun(@(x,m,s) find( bsxfun(@lt, abs( bsxfun(@minus,x,m) ), 2*s) ),...
    m, mu, sig, 'uni', 0);
data = cellfun(@(x,m,s) x( bsxfun(@lt, abs( bsxfun(@minus,x,m) ), 2*s) ),...
    m, mu, sig, 'uni', 0);

ps. your example is too small in size so there might be not enough samples to establish a threshold.

Share:
11,207
Eghbal
Author by

Eghbal

Updated on June 04, 2022

Comments

  • Eghbal
    Eghbal almost 2 years

    Suppose that we have this matrix :

    main = [10000   5   3   1;
    5   5677    0   134;
    1   1   456 3];
    

    This method the most widely used method in econometrics and statistical problems.X is our data that we're searching for outliers in it.

    X-mean(X)>= n*std(X)
    

    So If this Inequality was true, That sample is outlier otherwise We will keep the sample.

    Now my question. I want find outliers with these codes:

    meann = mean(main);
    stdd = std(main);
    out = find(main-repmat(meann,size(main,1),1)>=repmat(2*stdd,size(main,1),1));
    

    We are searching outliers in every column. Out should indicate index of outliers. In final step We should remove outliers in every column. Is any simpler function or method to do this in MAtLAB?

    Thanks.

  • Eghbal
    Eghbal over 9 years
    Thant true but using X-mean(X)> 2(or 3,...)*std is the most widely used method in econometrics and statistical problems.
  • Yvon
    Yvon over 9 years
    The lecture suggests using |X-mean| > 1.9x * std which is roughly 2.
  • Dan
    Dan over 9 years
    @user2991243 You're missing and absolute there, i.e. the |·| in Yvon's comment. It's very important!
  • Eghbal
    Eghbal over 9 years
    Yes. That's true. Thank you for your helps.