Standard deviation of generic list?
Solution 1
This article should help you. It creates a function that computes the deviation of a sequence of double
values. All you have to do is supply a sequence of appropriate data elements.
The resulting function is:
private double CalculateStandardDeviation(IEnumerable<double> values)
{
double standardDeviation = 0;
if (values.Any())
{
// Compute the average.
double avg = values.Average();
// Perform the Sum of (value-avg)_2_2.
double sum = values.Sum(d => Math.Pow(d - avg, 2));
// Put it all together.
standardDeviation = Math.Sqrt((sum) / (values.Count()-1));
}
return standardDeviation;
}
This is easy enough to adapt for any generic type, so long as we provide a selector for the value being computed. LINQ is great for that, the Select
funciton allows you to project from your generic list of custom types a sequence of numeric values for which to compute the standard deviation:
List<ValveData> list = ...
var result = list.Select( v => (double)v.SomeField )
.CalculateStdDev();
Solution 2
The example above is slightly incorrect and could have a divide by zero error if your population set is 1. The following code is somewhat simpler and gives the "population standard deviation" result. (http://en.wikipedia.org/wiki/Standard_deviation)
using System;
using System.Linq;
using System.Collections.Generic;
public static class Extend
{
public static double StandardDeviation(this IEnumerable<double> values)
{
double avg = values.Average();
return Math.Sqrt(values.Average(v=>Math.Pow(v-avg,2)));
}
}
Solution 3
Even though the accepted answer seems mathematically correct, it is wrong from the programming perspective - it enumerates the same sequence 4 times. This might be ok if the underlying object is a list or an array, but if the input is a filtered/aggregated/etc linq expression, or if the data is coming directly from the database or network stream, this would cause much lower performance.
I would highly recommend not to reinvent the wheel and use one of the better open source math libraries Math.NET. We have been using that lib in our company and are very happy with the performance.
PM> Install-Package MathNet.Numerics
var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();
var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();
See http://numerics.mathdotnet.com/docs/DescriptiveStatistics.html for more information.
Lastly, for those who want to get the fastest possible result and sacrifice some precision, read "one-pass" algorithm https://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods
Tom Hangler
Updated on July 05, 2022Comments
-
Tom Hangler almost 2 years
I need to calculate the standard deviation of a generic list. I will try to include my code. Its a generic list with data in it. The data is mostly floats and ints. Here is my code that is relative to it without getting into to much detail:
namespace ValveTesterInterface { public class ValveDataResults { private List<ValveData> m_ValveResults; public ValveDataResults() { if (m_ValveResults == null) { m_ValveResults = new List<ValveData>(); } } public void AddValveData(ValveData valve) { m_ValveResults.Add(valve); }
Here is the function where the standard deviation needs to be calculated:
public float LatchStdev() { float sumOfSqrs = 0; float meanValue = 0; foreach (ValveData value in m_ValveResults) { meanValue += value.LatchTime; } meanValue = (meanValue / m_ValveResults.Count) * 0.02f; for (int i = 0; i <= m_ValveResults.Count; i++) { sumOfSqrs += Math.Pow((m_ValveResults - meanValue), 2); } return Math.Sqrt(sumOfSqrs /(m_ValveResults.Count - 1)); } } }
Ignore whats inside the LatchStdev() function because I'm sure its not right. Its just my poor attempt to calculate the st dev. I know how to do it of a list of doubles, however not of a list of generic data list. If someone had experience in this, please help.
-
Tom Hangler almost 14 yearsmy c# doesnt have an AVERAGE. It doesnt show up. Thats one of my problems. Also I cannot pass a generic list through my function as a parameters. The mean needs to be implemented inside the stdevmethod like my code above. My standard deviation is off tho.
-
Tom Hangler almost 14 yearsAlso guys. C# doesn't have the average (Math.average). So i calculate the mean myself like my code above. Its the standard deviation that I have the most trouble with. Thanks
-
LBushkin almost 14 years@Tom Hangler, make sure you add
using System.Linq;
at the top of your file to include the library of LINQ functions. THese include bothAverage()
andSelect()
-
Tom Hangler almost 14 yearsoh ok thanks. Im sorry I'm a noob. I dont think that visual studio recognizes system.ling. Also what is the v=> and the d=> stand for? also should all the code you gave me be in my one standarddeviation function? thanks
-
LBushkin almost 14 yearsIt's a 'Q' not a 'G' at the end of System.Linq. I assumed you're using .NET 3.5, if not, then you will not have access to LINQ, and a slightly different solution would be appropriate.
-
LBushkin almost 14 yearsThe
v=>
andd=>
syntax (and what follows) creates a lambda expression - essentially an anonymous function that accepts a parameterv
orv
(respectively) and uses that to compute some result. You can read more about them here: msdn.microsoft.com/en-us/library/bb397687.aspx -
Jesse C. Slicer almost 14 yearsTake note that this algorithm implements Sample Standard Deviation as opposed to "plain" Standard Deviation.
-
tenpn almost 13 yearsthe
if(values.Count()>0)
line should probably check for > 1, since you're dividing byvalues.Count() - 1
. -
Wouter almost 12 yearsThis one should be the answer, it calculates Standard Deviation as opposed to the answer by LBushkin which really calculates Sample Standard Deviation
-
Levitikon about 8 years+1 This is the actual Standard Deviation (aka population standard deviation) as opposed to Sample Standard Deviation in LBushkin's answer.
-
BlueSky over 5 yearsFor much faster performance (3.37x on my machine), multiply the terms instead of using Math.Pow: (d - avg) * (d - avg) instead of: Math.Pow(d - avg, 2)
-
BlueSky over 5 yearsdouble sum = values.Sum(d => (d - avg) * (d - avg));
-
BlueSky almost 5 yearsreturn Math.Sqrt(values.Average(v=> (v-avg) * (v-avg))); is 3.37x faster on my machine. Math.Pow() is much slower than normal multiplication.
-
Jonathan DeMarks almost 5 years@BlueSky Thanks for doing the benchmark! I love having both options available to see clearly. Math.Pow() might be a bit more readable but your code is more performant, so folks can choose what is right for their scenario.
-
Aric almost 5 yearsWhen all values are equal to the mean, the standard deviation will be zero. In this case shouldn't
ret
be assigned an invalid value such as -1 at first to indicate when the standard deviation could not be calculated? Otherwise, there is the (admittedly very rare) possibility of returning a false negative since zero is a valid result. -
Aric almost 5 yearsAfter more thought, returning zero for an empty population could work, but it may be useful to indicate that there was no data in the return value.
-
Steven.Xi over 3 yearsFrom mathmatic, this is the the right answer. However you should definatly avoid using this code in production: the parameter is IEnumerable<double>, with this code, the IEnumerable will be invoked twice. Take a good sample, what if the this function is invoked on a EF query? Best way is check if this IEnumreable can bel cast to a collection, if not, do a .ToList() first.
-
Steven.Xi over 3 yearsSame as my comment below, avoid iterate IEnumerable<T> multiple times in an helper/extension function. As you never know where is this IEnumerable coming from. It could from a db query, which iterate multiple times will result duplicated db read. Cast / convert to a collection before iterate it pls.