How to calculate the sum of two normal distributions

26,258

Solution 1

The sum of two normal distributions is itself a normal distribution:

N(mean1, variance1) + N(mean2, variance2) ~ N(mean1 + mean2, variance1 + variance2)

This is all on wikipedia page.

Be careful that these really are variances and not standard deviations.

// X + Y
public static Gauss operator + (Gauss a, Gauss b) {
    //NOTE: this is valid if X,Y are independent normal random variables
    return new Gauss(a.mean + b.mean, a.variance + b.variance);
}

// X*b
public static Gauss operator * (Gauss a, double b) {
    return new Gauss(a.mean*b, a.variance*b*b);
}

Solution 2

To be more precise:

If a random variable Z is defined as the linear combination of two uncorrelated Gaussian random variables X and Y, then Z is itself a Gaussian random variable, e.g.:

if Z = aX + bY, then mean(Z) = a * mean(X) + b * mean(Y), and variance(Z) = a2 * variance(X) + b2 * variance(Y).

If the random variables are correlated, then you have to account for that. Variance(X) is defined by the expected value E([X-mean(X)]2). Working this through for Z = aX + bY, we get:

variance(Z) = a2 * variance(X) + b2 * variance(Y) + 2ab * covariance(X,Y)

If you are summing two uncorrelated random variables which do not have Gaussian distributions, then the distribution of the sum is the convolution of the two component distributions.

If you are summing two correlated non-Gaussian random variables, you have to work through the appropriate integrals yourself.

Solution 3

Hah, I thought you couldn't add gaussian distributions together, but you can!

http://mathworld.wolfram.com/NormalSumDistribution.html

In fact, the mean is the sum of the individual distributions, and the variance is the sum of the individual distributions.

Solution 4

Can anyone help define a statistically correct - or at least "reasonable" - version of the + operator?

Arguably not, as adding two distributions means different things - having worked in reliability and maintainablity my first reaction from the title would be the distribution of a system's mtbf, if the mtbf of each part is normally distributed and the system had no redundancy. You are talking about the distribution of the sum of two normally distributed independent variates, not the (logical) sum of two normal distributions' effect. Very often, operator overloading has surprising semantics. I'd leave it as a function and call it 'normalSumDistribution' unless your code has a very specific target audience.

Solution 5

I would have thought it depends on what type of addition you are doing. If you just want to get a normal distribution with properties (mean, standard deviation etc.) equal to the sum of two distributions then the addition of the properties as given in the other answers is fine. This is the assumption used in something like PERT where if a large number of normal probability distributions are added up then the resulting probability distribution is another normal probability distribution.

The problem comes when the two distributions being added are not similar. Take for instance adding a probability distribution with a mean of 2 and standard deviation of 1 and a probability distribution of 10 with a standard deviation of 2. If you add these two distributions up, you get a probability distribution with two peaks, one at 2ish and one at 10ish. The result is therefore not a normal distibution. The assumption about adding distributions is only really valid if the original distributions are either very similar or you have a lot of original distributions so that the peaks and troughs can be evened out.

Share:
26,258
Frank Krueger
Author by

Frank Krueger

I am an engineer living in Seattle. I have been programming for about 15 years. I started out with video game hacking with the Code Alliance. Moved on to embedded systems development in an R&D group at GM. Did way too much graphics (3D) programming. Then did a lot of network programming for large data centers. Was forced to get my Master's in Electrical Engineering. Got into compiler and interpreter development. Spent some time coding at Microsoft. Moved on a year later to start my own company creating control systems and web apps. I love programming and have spent way too much time learning too many languages, frameworks, APIs, paradigms, and operating systems. Super Secret Code: pL95Tr3

Updated on July 04, 2020

Comments

  • Frank Krueger
    Frank Krueger almost 4 years

    I have a value type that represents a gaussian distribution:

    struct Gauss {
        double mean;
        double variance;
    }
    

    I would like to perform an integral over a series of these values:

    Gauss eulerIntegrate(double dt, Gauss iv, Gauss[] values) {
        Gauss r = iv;
        foreach (Gauss v in values) {
            r += v*dt;
        }
        return r;
    }
    

    My question is how to implement addition for these normal distributions.

    The multiplication by a scalar (dt) seemed simple enough. But it wasn't simple! Thanks FOOSHNICK for the help:

    public static Gauss operator * (Gauss g, double d) {
        return new Gauss(g.mean * d, g.variance * d * d);
    }
    

    However, addition eludes me. I assume I can just add the means; it's the variance that's causing me trouble. Either of these definitions seems "logical" to me.

    public static Gauss operator + (Gauss a, Gauss b) {
        double mean = a.mean + b.mean;
        // Is it this? (Yes, it is!)
        return new Gauss(mean, a.variance + b.variance);        
        // Or this? (nope)
        //return new Gauss(mean, Math.Max(a.variance, b.variance));
        // Or how about this? (nope)
        //return new Gauss(mean, (a.variance + b.variance)/2);
    }
    

    Can anyone help define a statistically correct - or at least "reasonable" - version of the + operator?

    I suppose I could switch the code to use interval arithmetic instead, but I was hoping to stay in the world of prob and stats.