Nested function environment selection

12,599

Solution 1

To illustrate lexical scoping, consider the following:

First let's create a sandbox environment, only to avoid the oh-so-common R_GlobalEnv:

sandbox <-new.env()

Now we put two functions inside it: f, which looks for a variable named x; and g, which defines a local x and calls f:

sandbox$f <- function()
{
    value <- if(exists("x")) x else "not found."
    cat("This is function f looking for symbol x:", value, "\n")
}

sandbox$g <- function()
{
    x <- 123
    cat("This is function g. ")
    f()
}

Technicality: entering function definitions in the console causes then to have the enclosing environment set to R_GlobalEnv, so we manually force the enclosures of f and g to match the environment where they "belong":

environment(sandbox$f) <- sandbox
environment(sandbox$g) <- sandbox

Calling g. The local variable x=123 is not found by f:

> sandbox$g()
This is function g. This is function f looking for symbol x: not found. 

Now we create a x in the global environment and call g. The function f will look for x first in sandbox, and then in the parent of sandbox, which happens to be R_GlobalEnv:

> x <- 456
> sandbox$g()
This is function g. This is function f looking for symbol x: 456 

Just to check that f looks for x first in its enclosure, we can put a x there and call g:

> sandbox$x <- 789
> sandbox$g()
This is function g. This is function f looking for symbol x: 789 

Conclusion: symbol lookup in R follows the chain of enclosing environments, not the evaluation frames created during execution of nested function calls.

EDIT: Just adding a link to this very interesting answer from Martin Morgan on the related subject of parent.frame() vs parent.env()

Solution 2

You could use closures:

f2 <- function(...){
   f1 <- function(...){
     print(var1)
   }
   var1 <- "hello"
   f1(...)
 }
 f2()
Share:
12,599
dayne
Author by

dayne

Please see my personal website for more info and contact information.

Updated on June 13, 2022

Comments

  • dayne
    dayne almost 2 years

    I am writing some functions for doing repeated tasks, but I am trying to minimize the amount of times I load the data. Basically I have one function that takes some information and makes a plot. Then I have a second function that will loop through and output multiple plots to a .pdf. In both functions I have the following line of code:

    if(load.dat) load("myworkspace.RData")
    

    where load.dat is a logical and the data I need is stored in myworkspace.RData. When I am calling the wrapper function that loops through and outputs multiple plots I do not want to reload the workspace in every call to the inner function. I thought I could just load the workspace once in the wrapper function, then the inner function could access that data, but I got an error stating otherwise.

    So my understanding was when a function cannot find the variable in its local environment (created when the function gets called), the function will look to the parent environment for the variable.

    I assumed the parent environment to the inner function call would be the outer function call. Obviously this is not true:

    func1 <- function(...){
      print(var1)
    }
    
    func2 <- function(...){
      var1 <- "hello"
      func1(...)
    }
    
    > func2()
    Error in print(var1) : object 'var1' not found
    

    After reading numerous questions, the language manual, and this really helpful blog post, I came up with the following:

    var1 <- "hello"
    save(list="var1",file="test.RData")
    rm(var1)
    
    func3 <- function(...){
      attach("test.RData")
      func1(...)
      detach("file:test.RData")
    }
    
    > func3()
    [1] "hello"
    

    Is there a better way to do this? Why doesn't func1 look for undefined variables in the local environment created by func2, when it was func2 that called func1?

    Note: I did not know how to name this question. If anyone has better suggestions I will change it and edit this line out.

  • dayne
    dayne over 10 years
    Right, but I need to be able to use the inner function as a stand-alone function. I did not want to have to redefine the inner function every time I call the outer function (not to mention duplicate a bunch of code).
  • Karl Forner
    Karl Forner over 10 years
    Then the cleanest setting in my opinion: put all your data in a list (my_data), then give it as argument to your function. Inside the function you may use with(my_data, { } ) to avoid extra typing.
  • dayne
    dayne over 10 years
    This is the best illustration I have seen. Thank you so much! I was not really understanding the difference in environments and frames.