How do I determine the standard deviation (stddev) of a set of values?
Solution 1
While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...
Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.
By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 1;
foreach (double value in valueList)
{
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
k++;
}
return Math.Sqrt(S / (k-2));
}
If you have the whole population (as opposed to a sample population), then use return Math.Sqrt(S / (k-1));
.
EDIT: I've updated the code according to Jason's remarks...
EDIT: I've also updated the code according to Alex's remarks...
Solution 2
10 times faster solution than Jaime's, but be aware that, as Jaime pointed out:
"While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance"
If you think you are dealing with very large numbers or a very large quantity of numbers, you should calculate using both methods, if the results are equal, you know for sure that you can use "my" method for your case.
public static double StandardDeviation(double[] data)
{
double stdDev = 0;
double sumAll = 0;
double sumAllQ = 0;
//Sum of x and sum of x²
for (int i = 0; i < data.Length; i++)
{
double x = data[i];
sumAll += x;
sumAllQ += x * x;
}
//Mean (not used here)
//double mean = 0;
//mean = sumAll / (double)data.Length;
//Standard deviation
stdDev = System.Math.Sqrt(
(sumAllQ -
(sumAll * sumAll) / data.Length) *
(1.0d / (data.Length - 1))
);
return stdDev;
}
Solution 3
The accepted answer by Jaime is great, except you need to divide by k-2 in the last line (you need to divide by "number_of_elements-1"). Better yet, start k at 0:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 0;
foreach (double value in valueList)
{
k++;
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
}
return Math.Sqrt(S / (k-1));
}
Solution 4
The Math.NET library provides this for you to of the box.
PM> Install-Package MathNet.Numerics
var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();
var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();
See PopulationStandardDeviation for more information.
Solution 5
Code snippet:
public static double StandardDeviation(List<double> valueList)
{
if (valueList.Count < 2) return 0.0;
double sumOfSquares = 0.0;
double average = valueList.Average(); //.NET 3.0
foreach (double value in valueList)
{
sumOfSquares += Math.Pow((value - average), 2);
}
return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}
Related videos on Youtube
dead and bloated
Updated on July 15, 2020Comments
-
dead and bloated almost 4 years
I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..
-
Jason S almost 15 yearsplease please please please don't assume the OP is asking a question for homework purposes, rather than for a "real" project or for self-improvement. Ask them.
-
dead and bloated almost 15 yearsi actually am not asking for homework reasons, but if it helps people who are doing homework to find the answer, then please add the tag
-
vzwick over 11 years@overslacked The homework tag is being phased out and must not be used anymore (as I just learned myself) - meta.stackexchange.com/q/147100
-
-
nTraum almost 15 yearsYour formula is wrong. It should be sigma = sqrt( meansqr - mean^2 ) Read this page en.wikipedia.org/wiki/Standard_deviation carefully to see your mistake.
-
dmckee --- ex-moderator kitten almost 15 years@leif: Yep. And I should have noticed the dimensional problem, too.
-
Admin almost 15 yearsDividing by Count - 1 or Count depends on whether we're talking about entire population or sample, yes? Looks like OP is talking about a known population but not entirely clear.
-
Demi almost 15 yearsThat is correct - this is for sample variance. I appreciate the highlight.
-
Demi almost 15 yearsfor a sample stddev you shouldn't be passing an list with one item. A sample stddev of one item is worthless.
-
Jason S almost 15 years+1: I've read Knuth's commentary on this but have never known it was called Welford's method. FYI you can eliminate the k==1 case, it just works.
-
Jason S almost 15 yearsOH: and you're forgetting the divide-by-N or divide-by-N-1 at the end.
-
dmckee --- ex-moderator kitten almost 15 years@Jason: You are worried about loss of precision effects? Or what? I just don't see it....OK I followed the link on Jamie's answer. Loss of Precision it is. Point taken. ::shrug:: I'm an experimental scientist. We don't get populations with variations confined to the the 10^-9 level, and we generally use double precision for everything, so those populations we get with variation confined to the 10^-5 level come out OK anyway.
-
dmckee --- ex-moderator kitten almost 15 yearsNow look what you've done. You've given me something new to learn. You beast.
-
Emperor XLII almost 15 yearsJohn Cook has a good article on standard deviation for large values at johndcook.com/blog/2008/09/26/…, and a followup describing the reasons behind the algorithms at johndcook.com/blog/2008/09/28/….
-
Jaime almost 15 yearsIt actually is John's blog that I'm linking for a description of Welford's method...
-
Alexandre C. about 13 yearsActually, if you have the whole list beforehand, a corrected 2-pass algorithm will do fine (cf. eg. Numerical Recipes). This method is when you have a stream of values that you don't want to store.
-
alex.forencich almost 10 yearsActually, your variable k ends up being 1 larger than the actual number of points since it starts at 1 (e.g. with 2 points, k = 3). This means that you need to subtract that additional 1 in the last step, so k-1 for the whole population and k-2 for the sample population.
-
alex.forencich almost 10 yearsAlternatively you can start k at zero, but increment it first instead of last. Then set it back to k-1 at the end.
-
EnemyBagJones almost 8 yearsThis answer was fantastically helpful. My statistics knowledge is very rusty and most of the Welford method guides are pretty math heavy - this gave me what I needed in Python in 5 minutes. Thanks a million.
-
alv almost 5 yearslink broken, use numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/…
-
Admin almost 5 yearswith the running formula, you better use decimal instead of double to mitigate rounding and truncation errors and convert the result back to double
-
Jay almost 4 years@Jaime, I think sample should be a bool parameter and the last line should be:
return S == 0 ? 0 : Math.Sqrt(S / (k - (sample ? 2 : 1)));
to handle cases of 1 value such as[15.8569444444444]
based on other answers such as:stackoverflow.com/questions/2253874/standard-deviation-in-linq/… I would also say:return Math.Sqrt(S / (k - (sample ? 1 : 0)));
would work