Calculating weighted mean and standard deviation

44,804

Solution 1

R provides weighted mean. In fact, ?weighted.mean shows this example:

 ## GPA from Siegel 1994
 wt <- c(5,  5,  4,  1)/15
 x <- c(3.7,3.3,3.5,2.8)
 xm <- weighted.mean(x, wt)

One more step:

v <- sum(wt * (x - xm)^2)

Solution 2

The Hmisc package contains the functions you need.

Thus:

x <- c(3.7,3.3,3.5,2.8)

wt <- c(5,  5,  4,  1)/15

xm <- wtd.mean(x, wt)

var <- wtd.var(x, wt)

sd <- sqrt(var)

Unfortunately the author of the Hmisc package did not include an explicit wtd.sd function. You have to square root wtd.var.

Charles Kangai

Solution 3

I too get errors from Hmisc when using the wtd.var() function. Fortunately, SDMTools has comparable functionality, and even calculates SD directly for you, without needing to take sqrt of variance.

library(SDMTools)

x <- c(3.7,3.3,3.5,2.8)
wt <- c(5,  5,  4,  1)/15  ## Note: no actual need to normalize weights to sum to 1, this will be done automatically.

wt.mean(x, wt)
wt.sd(x,wt)

wt.var(x, wt)

Solution 4

Package Hmisc has function wt.var(), as noted by others.

Note that you need to understand whether you want frequency or reliability weights. In your case, I believe you are interested in reliability weights, so will need to set explicitely normwt=TRUE. In that case you can give your weights in any format (sum to 1, sum to N, etc). If you were to use frequency weights, you would need to be careful about how you specify weights.

library(Hmisc)

n <- 3
x <- seq_len(n)
w <- c(0.1, 0.2, 0.6)
w2 <- w / min(w)
w3 <- w / sum(w)

## reliability weights?
wtd.var(x = x, weights = w, normwt=TRUE)
#> [1] 0.95
wtd.var(x = x, weights = w2, normwt=TRUE)
#> [1] 0.95
wtd.var(x = x, weights = w3, normwt=TRUE)
#> [1] 0.95

## frequency weights?
wtd.var(x = x, weights = w)
#> Warning in wtd.var(x = x, weights = w): only one effective observation; variance
#> estimate undefined
#> [1] -4.222222
wtd.var(x = x, weights = w2)
#> [1] 0.5277778
wtd.var(x = x, weights = w3)
#> Warning in wtd.var(x = x, weights = w3): only one effective observation;
#> variance estimate undefined
#> [1] Inf

Created on 2020-08-26 by the reprex package (v0.3.0)

Share:
44,804
Alex
Author by

Alex

Updated on August 27, 2020

Comments

  • Alex
    Alex over 3 years

    I have a time series x_0 ... x_t. I would like to compute the exponentially weighted variance of the data. That is:

    V = SUM{w_i*(x_i - x_bar)^2, i=1 to T} where SUM{w_i} = 1 and x_bar=SUM{w_i*x_i}
    

    ref: http://en.wikipedia.org/wiki/Weighted_mean#Weighted_sample_variance

    The goal is to basically weight observations that are further back in time less. This is very simple to implement but I would like to use as much built in funcitonality as possible. Does anyone know what this corresponds to in R?

    Thanks

  • Alex
    Alex about 12 years
    Hmisc, as it turned out, does just this.
  • Matthew Lundberg
    Matthew Lundberg about 12 years
    Note the last line in the answer. That is the weighted variance.
  • tharen
    tharen almost 12 years
    Just a hint... If you are thick skulled like me, 15 is the sum of the individual weights. Weights are then normalized in this case. I didn't catch that at first.
  • Torvon
    Torvon almost 9 years
    wtd.mean works, but wtd.var in your example gives 'INF'. why is that?
  • skan
    skan almost 8 years
    you forgot to divide v by n or n-1
  • Matthew Lundberg
    Matthew Lundberg almost 8 years
    @skan The formula above represents the population variance for the set. Note that sum(wt) == 1 so multiplying by wt in the expression is that division.
  • skan
    skan almost 8 years
    it doesn't produce exactly the same value than the function wt.var(x, wt) from the library SMDtools, maybe they use some correction?
  • David J. Harris
    David J. Harris about 7 years
    @Torvon This is now fixed in the development version of Hmisc. github.com/harrelfe/Hmisc/issues/69
  • vdesai
    vdesai over 6 years
    sum(wt) need not be 1.
  • Benjamin Ziepert
    Benjamin Ziepert over 4 years
    SDMTools is not maintained anymore.
  • Michael Lachmann
    Michael Lachmann over 3 years
    It seems this answer currently works best. I think more consistent would be to have v <- weighted.mean( (x-xm)^2, wt ) Since that also works when weights are not normalized.
  • Alex
    Alex over 2 years
    The difference between frequency/reliability weights isn't given enough attention. +1
  • Alex
    Alex over 2 years
    You should also consider whether your weights are frequency or reliability weights. See @Matifou below.