Weighted sampling in R
15,441
Use the prob
argument of sample()
:
samp_idx <- sample(seq_len(nrow(data)), nsample, prob=data$ww)
new_data <- data[samp_idx, ]
Something like this. Running time is
# user system elapsed
# 0.015 0.000 0.014
versus your version:
# user system elapsed
# 4.278 0.007 4.290
Related videos on Youtube
Author by
emanuele
Updated on June 16, 2022Comments
-
emanuele about 2 years
I have a data frame
data
. At each rowi
have assigned a weight that is indata$ww
. Now I would like to make a samplenew_data
ofdata
, weighted bydf$ww
. I have tried withsubset
but it very slow.# sample data data <- data.frame(var1 = log(sample(1:5000))) ndata <- nrow(data) maxW <- max(data$var1) nsample <- 4000 rr <- runif(ndata) data$ww <- cumsum(exp(data$var1)) new_data <- data[0, ] i <- 1 while(nrow(new_data) < nsample) { new_data[i, ] <- subset(data, data$ww > rr[i] * maxW)[1,] i <- i + 1 }
Is there a faster way?
-
isomorphismes almost 9 yearsWhy is this question heavily downvoted?
-