Calculating weighted mean and standard deviation
Solution 1
R provides weighted mean. In fact, ?weighted.mean shows this example:
## GPA from Siegel 1994
wt <- c(5, 5, 4, 1)/15
x <- c(3.7,3.3,3.5,2.8)
xm <- weighted.mean(x, wt)
One more step:
v <- sum(wt * (x - xm)^2)
Solution 2
The Hmisc package contains the functions you need.
Thus:
x <- c(3.7,3.3,3.5,2.8)
wt <- c(5, 5, 4, 1)/15
xm <- wtd.mean(x, wt)
var <- wtd.var(x, wt)
sd <- sqrt(var)
Unfortunately the author of the Hmisc package did not include an explicit wtd.sd
function. You have to square root wtd.var.
Charles Kangai
Solution 3
I too get errors from Hmisc
when using the wtd.var()
function. Fortunately, SDMTools
has comparable functionality, and even calculates SD directly for you, without needing to take sqrt of variance.
library(SDMTools)
x <- c(3.7,3.3,3.5,2.8)
wt <- c(5, 5, 4, 1)/15 ## Note: no actual need to normalize weights to sum to 1, this will be done automatically.
wt.mean(x, wt)
wt.sd(x,wt)
wt.var(x, wt)
Solution 4
Package Hmisc
has function wt.var()
, as noted by others.
Note that you need to understand whether you want frequency or reliability weights. In your case, I believe you are interested in reliability weights, so will need to set explicitely normwt=TRUE
. In that case you can give your weights in any format (sum to 1, sum to N, etc). If you were to use frequency weights, you would need to be careful about how you specify weights.
library(Hmisc)
n <- 3
x <- seq_len(n)
w <- c(0.1, 0.2, 0.6)
w2 <- w / min(w)
w3 <- w / sum(w)
## reliability weights?
wtd.var(x = x, weights = w, normwt=TRUE)
#> [1] 0.95
wtd.var(x = x, weights = w2, normwt=TRUE)
#> [1] 0.95
wtd.var(x = x, weights = w3, normwt=TRUE)
#> [1] 0.95
## frequency weights?
wtd.var(x = x, weights = w)
#> Warning in wtd.var(x = x, weights = w): only one effective observation; variance
#> estimate undefined
#> [1] -4.222222
wtd.var(x = x, weights = w2)
#> [1] 0.5277778
wtd.var(x = x, weights = w3)
#> Warning in wtd.var(x = x, weights = w3): only one effective observation;
#> variance estimate undefined
#> [1] Inf
Created on 2020-08-26 by the reprex package (v0.3.0)
Alex
Updated on August 27, 2020Comments
-
Alex over 3 years
I have a time series
x_0 ... x_t
. I would like to compute the exponentially weighted variance of the data. That is:V = SUM{w_i*(x_i - x_bar)^2, i=1 to T} where SUM{w_i} = 1 and x_bar=SUM{w_i*x_i}
ref: http://en.wikipedia.org/wiki/Weighted_mean#Weighted_sample_variance
The goal is to basically weight observations that are further back in time less. This is very simple to implement but I would like to use as much built in funcitonality as possible. Does anyone know what this corresponds to in R?
Thanks
-
Alex about 12 yearsHmisc, as it turned out, does just this.
-
Matthew Lundberg about 12 yearsNote the last line in the answer. That is the weighted variance.
-
tharen almost 12 yearsJust a hint... If you are thick skulled like me, 15 is the sum of the individual weights. Weights are then normalized in this case. I didn't catch that at first.
-
Torvon almost 9 yearswtd.mean works, but wtd.var in your example gives 'INF'. why is that?
-
skan almost 8 yearsyou forgot to divide v by n or n-1
-
Matthew Lundberg almost 8 years@skan The formula above represents the population variance for the set. Note that
sum(wt) == 1
so multiplying bywt
in the expression is that division. -
skan almost 8 yearsit doesn't produce exactly the same value than the function wt.var(x, wt) from the library SMDtools, maybe they use some correction?
-
David J. Harris about 7 years@Torvon This is now fixed in the development version of Hmisc. github.com/harrelfe/Hmisc/issues/69
-
vdesai over 6 yearssum(wt) need not be 1.
-
Benjamin Ziepert over 4 yearsSDMTools is not maintained anymore.
-
Michael Lachmann over 3 yearsIt seems this answer currently works best. I think more consistent would be to have
v <- weighted.mean( (x-xm)^2, wt )
Since that also works when weights are not normalized. -
Alex over 2 yearsThe difference between frequency/reliability weights isn't given enough attention. +1
-
Alex over 2 yearsYou should also consider whether your weights are frequency or reliability weights. See @Matifou below.