Weighted sampling in R

15,441

Use the prob argument of sample():

samp_idx <- sample(seq_len(nrow(data)), nsample, prob=data$ww)
new_data <- data[samp_idx, ]

Something like this. Running time is

# user  system elapsed 
# 0.015   0.000   0.014 

versus your version:

# user  system elapsed 
# 4.278   0.007   4.290 
Share:
15,441

Related videos on Youtube

emanuele
Author by

emanuele

Updated on June 16, 2022

Comments

  • emanuele
    emanuele about 2 years

    I have a data frame data. At each row i have assigned a weight that is in data$ww. Now I would like to make a sample new_data of data, weighted by df$ww. I have tried with subset but it very slow.

    # sample data
    data <- data.frame(var1 = log(sample(1:5000)))
    ndata <- nrow(data)
    maxW <- max(data$var1)
    
    nsample <- 4000
    rr <- runif(ndata)
    data$ww <- cumsum(exp(data$var1))
    new_data <- data[0, ]
    i <- 1
    while(nrow(new_data) < nsample) {
      new_data[i, ] <- subset(data, data$ww > rr[i] * maxW)[1,]
      i <- i + 1
    }
    

    Is there a faster way?

    • isomorphismes
      isomorphismes almost 9 years
      Why is this question heavily downvoted?

Related