How do I determine the standard deviation (stddev) of a set of values?

c# math statistics numerical

46,735

Solution 1

While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...

Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.

By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    int k = 1;
    foreach (double value in valueList) 
    {
        double tmpM = M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
        k++;
    }
    return Math.Sqrt(S / (k-2));
}

If you have the whole population (as opposed to a sample population), then use return Math.Sqrt(S / (k-1));.

EDIT: I've updated the code according to Jason's remarks...

EDIT: I've also updated the code according to Alex's remarks...

Solution 2

10 times faster solution than Jaime's, but be aware that, as Jaime pointed out:

"While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance"

If you think you are dealing with very large numbers or a very large quantity of numbers, you should calculate using both methods, if the results are equal, you know for sure that you can use "my" method for your case.

    public static double StandardDeviation(double[] data)
    {
        double stdDev = 0;
        double sumAll = 0;
        double sumAllQ = 0;

        //Sum of x and sum of x²
        for (int i = 0; i < data.Length; i++)
        {
            double x = data[i];
            sumAll += x;
            sumAllQ += x * x;
        }

        //Mean (not used here)
        //double mean = 0;
        //mean = sumAll / (double)data.Length;

        //Standard deviation
        stdDev = System.Math.Sqrt(
            (sumAllQ -
            (sumAll * sumAll) / data.Length) *
            (1.0d / (data.Length - 1))
            );

        return stdDev;
    }

Solution 3

The accepted answer by Jaime is great, except you need to divide by k-2 in the last line (you need to divide by "number_of_elements-1"). Better yet, start k at 0:

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    int k = 0;
    foreach (double value in valueList) 
    {
        k++;
        double tmpM = M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
    }
    return Math.Sqrt(S / (k-1));
}

Solution 4

The Math.NET library provides this for you to of the box.

PM> Install-Package MathNet.Numerics

var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();

var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();

See PopulationStandardDeviation for more information.

Solution 5

Code snippet:

public static double StandardDeviation(List<double> valueList)
{
    if (valueList.Count < 2) return 0.0;
    double sumOfSquares = 0.0;
    double average = valueList.Average(); //.NET 3.0
    foreach (double value in valueList) 
    {
        sumOfSquares += Math.Pow((value - average), 2);
    }
    return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}

View more solutions

46,735

dead and bloated

Updated on July 15, 2020

Comments

dead and bloated almost 4 years

I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..
- Jason S almost 15 years
  
  please please please please don't assume the OP is asking a question for homework purposes, rather than for a "real" project or for self-improvement. Ask them.
- dead and bloated almost 15 years
  
  i actually am not asking for homework reasons, but if it helps people who are doing homework to find the answer, then please add the tag
- vzwick over 11 years
  
  @overslacked The homework tag is being phased out and must not be used anymore (as I just learned myself) - meta.stackexchange.com/q/147100
nTraum almost 15 years

Your formula is wrong. It should be sigma = sqrt( meansqr - mean^2 ) Read this page en.wikipedia.org/wiki/Standard_deviation carefully to see your mistake.
dmckee --- ex-moderator kitten almost 15 years

@leif: Yep. And I should have noticed the dimensional problem, too.
Admin almost 15 years

Dividing by Count - 1 or Count depends on whether we're talking about entire population or sample, yes? Looks like OP is talking about a known population but not entirely clear.
Demi almost 15 years

That is correct - this is for sample variance. I appreciate the highlight.
Demi almost 15 years

for a sample stddev you shouldn't be passing an list with one item. A sample stddev of one item is worthless.
Jason S almost 15 years

+1: I've read Knuth's commentary on this but have never known it was called Welford's method. FYI you can eliminate the k==1 case, it just works.
Jason S almost 15 years

OH: and you're forgetting the divide-by-N or divide-by-N-1 at the end.
dmckee --- ex-moderator kitten almost 15 years

@Jason: You are worried about loss of precision effects? Or what? I just don't see it....OK I followed the link on Jamie's answer. Loss of Precision it is. Point taken. ::shrug:: I'm an experimental scientist. We don't get populations with variations confined to the the 10^-9 level, and we generally use double precision for everything, so those populations we get with variation confined to the 10^-5 level come out OK anyway.
dmckee --- ex-moderator kitten almost 15 years

Now look what you've done. You've given me something new to learn. You beast.
Emperor XLII almost 15 years

John Cook has a good article on standard deviation for large values at johndcook.com/blog/2008/09/26/…, and a followup describing the reasons behind the algorithms at johndcook.com/blog/2008/09/28/….
Jaime almost 15 years

It actually is John's blog that I'm linking for a description of Welford's method...
Alexandre C. about 13 years

Actually, if you have the whole list beforehand, a corrected 2-pass algorithm will do fine (cf. eg. Numerical Recipes). This method is when you have a stream of values that you don't want to store.
alex.forencich almost 10 years

Actually, your variable k ends up being 1 larger than the actual number of points since it starts at 1 (e.g. with 2 points, k = 3). This means that you need to subtract that additional 1 in the last step, so k-1 for the whole population and k-2 for the sample population.
alex.forencich almost 10 years

Alternatively you can start k at zero, but increment it first instead of last. Then set it back to k-1 at the end.
EnemyBagJones almost 8 years

This answer was fantastically helpful. My statistics knowledge is very rusty and most of the Welford method guides are pretty math heavy - this gave me what I needed in Python in 5 minutes. Thanks a million.
alv almost 5 years

link broken, use numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/…
Admin almost 5 years

with the running formula, you better use decimal instead of double to mitigate rounding and truncation errors and convert the result back to double
Jay almost 4 years

@Jaime, I think sample should be a bool parameter and the last line should be: return S == 0 ? 0 : Math.Sqrt(S / (k - (sample ? 2 : 1))); to handle cases of 1 value such as [15.8569444444444] based on other answers such as:stackoverflow.com/questions/2253874/standard-deviation-in‌-linq/… I would also say: return Math.Sqrt(S / (k - (sample ? 1 : 0))); would work