How do I determine the standard deviation (stddev) of a set of values?

46,735

Solution 1

While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...

Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.

By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    int k = 1;
    foreach (double value in valueList) 
    {
        double tmpM = M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
        k++;
    }
    return Math.Sqrt(S / (k-2));
}

If you have the whole population (as opposed to a sample population), then use return Math.Sqrt(S / (k-1));.

EDIT: I've updated the code according to Jason's remarks...

EDIT: I've also updated the code according to Alex's remarks...

Solution 2

10 times faster solution than Jaime's, but be aware that, as Jaime pointed out:

"While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance"

If you think you are dealing with very large numbers or a very large quantity of numbers, you should calculate using both methods, if the results are equal, you know for sure that you can use "my" method for your case.

    public static double StandardDeviation(double[] data)
    {
        double stdDev = 0;
        double sumAll = 0;
        double sumAllQ = 0;

        //Sum of x and sum of x²
        for (int i = 0; i < data.Length; i++)
        {
            double x = data[i];
            sumAll += x;
            sumAllQ += x * x;
        }

        //Mean (not used here)
        //double mean = 0;
        //mean = sumAll / (double)data.Length;

        //Standard deviation
        stdDev = System.Math.Sqrt(
            (sumAllQ -
            (sumAll * sumAll) / data.Length) *
            (1.0d / (data.Length - 1))
            );

        return stdDev;
    }

Solution 3

The accepted answer by Jaime is great, except you need to divide by k-2 in the last line (you need to divide by "number_of_elements-1"). Better yet, start k at 0:

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    int k = 0;
    foreach (double value in valueList) 
    {
        k++;
        double tmpM = M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
    }
    return Math.Sqrt(S / (k-1));
}

Solution 4

The Math.NET library provides this for you to of the box.

PM> Install-Package MathNet.Numerics

var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();

var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();

See PopulationStandardDeviation for more information.

Solution 5

Code snippet:

public static double StandardDeviation(List<double> valueList)
{
    if (valueList.Count < 2) return 0.0;
    double sumOfSquares = 0.0;
    double average = valueList.Average(); //.NET 3.0
    foreach (double value in valueList) 
    {
        sumOfSquares += Math.Pow((value - average), 2);
    }
    return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}
Share:
46,735

Related videos on Youtube

dead and bloated
Author by

dead and bloated

Updated on July 15, 2020

Comments

  • dead and bloated
    dead and bloated almost 4 years

    I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..

    • Jason S
      Jason S almost 15 years
      please please please please don't assume the OP is asking a question for homework purposes, rather than for a "real" project or for self-improvement. Ask them.
    • dead and bloated
      dead and bloated almost 15 years
      i actually am not asking for homework reasons, but if it helps people who are doing homework to find the answer, then please add the tag
    • vzwick
      vzwick over 11 years
      @overslacked The homework tag is being phased out and must not be used anymore (as I just learned myself) - meta.stackexchange.com/q/147100
  • nTraum
    nTraum almost 15 years
    Your formula is wrong. It should be sigma = sqrt( meansqr - mean^2 ) Read this page en.wikipedia.org/wiki/Standard_deviation carefully to see your mistake.
  • dmckee --- ex-moderator kitten
    dmckee --- ex-moderator kitten almost 15 years
    @leif: Yep. And I should have noticed the dimensional problem, too.
  • Admin
    Admin almost 15 years
    Dividing by Count - 1 or Count depends on whether we're talking about entire population or sample, yes? Looks like OP is talking about a known population but not entirely clear.
  • Demi
    Demi almost 15 years
    That is correct - this is for sample variance. I appreciate the highlight.
  • Demi
    Demi almost 15 years
    for a sample stddev you shouldn't be passing an list with one item. A sample stddev of one item is worthless.
  • Jason S
    Jason S almost 15 years
    +1: I've read Knuth's commentary on this but have never known it was called Welford's method. FYI you can eliminate the k==1 case, it just works.
  • Jason S
    Jason S almost 15 years
    OH: and you're forgetting the divide-by-N or divide-by-N-1 at the end.
  • dmckee --- ex-moderator kitten
    dmckee --- ex-moderator kitten almost 15 years
    @Jason: You are worried about loss of precision effects? Or what? I just don't see it....OK I followed the link on Jamie's answer. Loss of Precision it is. Point taken. ::shrug:: I'm an experimental scientist. We don't get populations with variations confined to the the 10^-9 level, and we generally use double precision for everything, so those populations we get with variation confined to the 10^-5 level come out OK anyway.
  • dmckee --- ex-moderator kitten
    dmckee --- ex-moderator kitten almost 15 years
    Now look what you've done. You've given me something new to learn. You beast.
  • Emperor XLII
    Emperor XLII almost 15 years
    John Cook has a good article on standard deviation for large values at johndcook.com/blog/2008/09/26/…, and a followup describing the reasons behind the algorithms at johndcook.com/blog/2008/09/28/….
  • Jaime
    Jaime almost 15 years
    It actually is John's blog that I'm linking for a description of Welford's method...
  • Alexandre C.
    Alexandre C. about 13 years
    Actually, if you have the whole list beforehand, a corrected 2-pass algorithm will do fine (cf. eg. Numerical Recipes). This method is when you have a stream of values that you don't want to store.
  • alex.forencich
    alex.forencich almost 10 years
    Actually, your variable k ends up being 1 larger than the actual number of points since it starts at 1 (e.g. with 2 points, k = 3). This means that you need to subtract that additional 1 in the last step, so k-1 for the whole population and k-2 for the sample population.
  • alex.forencich
    alex.forencich almost 10 years
    Alternatively you can start k at zero, but increment it first instead of last. Then set it back to k-1 at the end.
  • EnemyBagJones
    EnemyBagJones almost 8 years
    This answer was fantastically helpful. My statistics knowledge is very rusty and most of the Welford method guides are pretty math heavy - this gave me what I needed in Python in 5 minutes. Thanks a million.
  • alv
    alv almost 5 years
  • Admin
    Admin almost 5 years
    with the running formula, you better use decimal instead of double to mitigate rounding and truncation errors and convert the result back to double
  • Jay
    Jay almost 4 years
    @Jaime, I think sample should be a bool parameter and the last line should be: return S == 0 ? 0 : Math.Sqrt(S / (k - (sample ? 2 : 1))); to handle cases of 1 value such as [15.8569444444444] based on other answers such as:stackoverflow.com/questions/2253874/standard-deviation-in‌​-linq/… I would also say: return Math.Sqrt(S / (k - (sample ? 1 : 0))); would work