R: Confusion with apply() vs for loop

r
14,301

Solution 1

Using an apply function to do your regression is mostly a matter of preference in this case; it can handle some of the bookkeeping for you (and so possibly prevent errors) but won't speed up the code.

I would suggest using vectorized functions though to compute your first's and last's, though, perhaps something like:

window <- 5
ng <- 15 #or ncol(g)
xy <- data.frame(first = pmax( (1:ng) - window, 1 ), 
                  last = pmin( (1:ng) + window, ng) )

Or be even smarter with

xy <- data.frame(first= c(rep(1, window), 1:(ng-window) ), 
                 last = c((window+1):ng, rep(ng, window)) )

Then you could use this in a for loop like this:

results <- list()
for(i in 1:nrow(xy)) {
  results[[i]] <- xy$first[i] : xy$last[i]
}
results

or with lapply like this:

results <- lapply(1:nrow(xy), function(i) {
  xy$first[i] : xy$last[i]
})

where in both cases I just return the sequence between first and list; you would substitute with your actual regression code.

Solution 2

This question touches several points that are made in 'The R Inferno' http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

There are some loops you should avoid, but not all of them. And using an apply function is more hiding the loop than avoiding it. This example seems like a good choice to leave in a 'for' loop.

Growing objects is generally bad form -- it can be extremely inefficient in some cases. If you are going to have a blanket rule, then "not growing objects" is a better one than "avoid loops".

You can create a list with the final length by:

result <- vector("list", ncol(g))
for(i in 1:ncol(g)) {
    # stuff
    result[[i]] <- #results
}

In some circumstances you might think the command:

window<-5

means give me a logical vector stating which values of 'window' are less than -5.

Spaces are good to use, mostly not to confuse humans, but to get the meaning directly above not to confuse R.

Share:
14,301

Related videos on Youtube

JoshDG
Author by

JoshDG

Updated on June 15, 2022

Comments

  • JoshDG
    JoshDG almost 2 years

    I know that I should avoid for-loops, but I'm not exactly sure how to do what I want to do with an apply function.

    Here is a slightly simplified model of what I'm trying to do. So, essentially I have a big matrix of predictors and I want to run a regression using a window of 5 predictors on each side of the indexed predictor (i in the case of a for loop). With a for loop, I can just say something like:

    results<-NULL
    window<-5
    for(i in 1:ncol(g))
    {
        first<-i-window #Set window boundaries
        if(first<1){
            1->first
        }
        last<-i+window-1
        if(last>ncol(g)){
            ncol(g)->last
        }
        predictors<-g[,first:last]
    
        #Do regression stuff and return some result
        results[i]<-regression stuff
    }
    

    Is there a good way to do this with an apply function? My problem is that the vector that apply would be shoving into the function really doesn't matter. All that matters is the index.

    • John
      John over 12 years
      Sacha... not entirely true.. notably, lapply can sometimes have terrific speedups. Furthermore, the syntactic sugar is there to get you to break up complicated loops and functions so that you just apply to the components that need it.