building an R loop for random sampling

11,791

Solution 1

An alternative solution could be (keep in mind what @Tyler Rinker just said about replicate)

Data <- read.table(text='
312313.34
243444
12334.92
321312
353532', header=FALSE)

Data <- as.numeric(as.matrix((Data))) 

set.seed(007)
Means <- replicate(1000, mean(sample(Data,50,replace=TRUE))) 

Means consists of 1000 mean each one for every subsample of size 50. If you want the mean of the means do this:

mean(Means) 

What you're trying to do sounds like a bootstrapping or something similar to resample techniques for bias reduction (I guess).

Solution 2

I'd make a function out of the sampling and then repeat that over and over with lapply (though replicate would likely work too I've had experiences with this being much slower)

I'd recommend not write to an object named list as this is an important function.

So it would look something like this:

#make a data set that may look like yours
LIST <- rnorm(1000)

#take your code and make a function   
mean.find <- function(dat) {
    out <-sample(dat, 50,replace=TRUE)
    mean(out)
}

#a single use yo check it out 
mean.find(LIST)

#repeat it 1000 times with lapply
reps <- unlist(lapply(seq_len(1000), mean.find))

#take the mean of that
mean(reps)
Share:
11,791
user964689
Author by

user964689

Updated on June 05, 2022

Comments

  • user964689
    user964689 almost 2 years

    I am sampling from a file containing a list of many values eg:

    312313.34
    243444
    12334.92
    321312
    353532
    

    and using R to randomly sample from this list:

    list = read.table("data")
    out <-sample(list,50,replace=TRUE)
    out.mean<-mean(out)
    out.mean
    

    Could somebody please show me how to put this into a loop, so that I can perform this procedure 1000 times and take the mean of the 1000 means that this will generate?

    Thank you very much in advance!

    Rubal

  • Paul Hiemstra
    Paul Hiemstra over 11 years
    +1 and Some nitpicking;). The function mean.find has an input argument which is not used: dat. Best would to either omit it, or use it i.e. out<-sample(dat,50,replace=TRUE). This makes the function more general, you can get 50 samples for any dataset, and the code does not depend on scope finding a global variable calldd LIST.
  • Tyler Rinker
    Tyler Rinker over 11 years
    @Paul good catch, that was my intention but I never followed through. I made the edit as you suggested. nitpicking = growth :D
  • Paul Hiemstra
    Paul Hiemstra over 11 years
    Using global objects from within functions makes me a bit uneasy, although I use it quite alot. In short scripts the risk probably small, but in larger projects it can lead to unpredictable behavior.