R: how does a foreach loop find a function that should be invoked?

10,810

Solution 1

They behave differently because registerDoParallel registers an mclapply backend on Linux, while it registers a clusterApplyLB backend on Windows. When using an mclapply backend, there are essentially no data exporting issues, so it works on Linux. But with clusterApplyLB, you can run into problems if foreach doesn't auto-export the functions and data that are needed.

You can solve this problem by modifying FUN3 to export FUN via the .export option:

FUN3 <- function(a, b) {
  foreach(i=1:3, .export='FUN') %dopar% FUN(i, a, b)
}

This solution works on both Linux and Windows, since .export is ignored by the mclapply backend.

As pointed out by Hong Ooi, you have an error in your use of clusterExport, but I wouldn't use clusterExport to solve the problem since it is backend specific.

Solution 2

In your clusterExport call, remove the env=environment() part. What you're doing is telling clusterExport to look for your objects in a brand new environment, so naturally it doesn't find them.

Share:
10,810

Related videos on Youtube

user7417
Author by

user7417

Updated on June 06, 2022

Comments

  • user7417
    user7417 almost 2 years

    I have problems when I use a foreach loop (using %dopar%) which invokes a self-defined function. There is not really a problem when I work with Linux, but when I use Windows the self-defined function cannot be found. It is hard to explain the problem in words, so I composed a small example to show it. Assume I have a collection of three simple functions, where FUN2 (using %do%) and FUN3 (using %dopar%) invoke the first one (FUN):

    FUN <- function(x,y,z) { x + y + z }
    FUN2 <- function(a, b) {
      foreach(i=1:3) %do% FUN(i, a, b)
    }
    FUN3 <- function(a, b) {
      foreach(i=1:3) %dopar% FUN(i, a, b)
    }
    

    The functions are stored in a script called foreach_testfunctions.R. In another script (foreach.test) I source these functions, use library(doParallel) and try to use the functions. First I do it with Linux and all works fine:

    source("foreach_testfunctions.R")
    a <- 2
    b <- 3
    library(doParallel)
    registerDoParallel()
    
    foreach(i=1:3) %do% FUN(i, a, b)    ## works fine
    FUN2(a, b)                          ## works fine
    foreach(i=1:3) %dopar% FUN(i, a, b) ## works fine
    FUN3(a, b)                          ## works fine 
    

    Then I do it in Windows:

    source("foreach_testfunctions.R")
    a <- 2
    b <- 3
    library(doParallel)
    cl <- makeCluster(3)
    registerDoParallel(cl)
    
    foreach(i=1:3) %do% FUN(i, a, b)    ## works fine
    FUN2(a, b)                          ## works fine
    foreach(i=1:3) %dopar% FUN(i, a, b) ## works fine
    FUN3(a, b)                          ## does not work
    Error in FUN(i, a, b) : task 1 failed - "Could not find function "FUN""
    

    Conclusion: (1) No problems with %do%. (2) Problems with %dopar% when using Windows. I tried inserting the line clusterExport(cl, varlist=c("FUN", "a", "b"), env=environment()) before the line that invokes FUN3 to make sure that the function FUN and the variables a and b are found in the proper environment, but the error remains.

    My questions: Why does Windows behave different than Linux although the code is identical (apart from the different registerDoParallel syntax)? How can I make sure that Windows does find function FUN when invoked via function FUN3?

  • Antoine
    Antoine about 8 years
    +1 nice answer. What if FUN has not been previously sourced? (i.e., is not present in the working environment)? Is it possible to make it available inside foreach by exporting the corresponding R file? (e.g., .export="path/to/FUN.R"). In other words, does .export work for files in addition to R objects?
  • Antoine
    Antoine about 8 years
    @Steve Weston the reason is that in my case each worker needs to write to a .txt file, call an executable taking the file as input, and read back the output of the .exe (which is another .txt file). I was wondering whether .export could make available (copy) the .exe file to each worker' temporary working directory
  • Steve Weston
    Steve Weston about 8 years
    @Antoine No, you can only specify the name of one or more variables with the foreach .export argument. Beyond that, I usually use backend-specific methods of initializing workers, such as clusterExport or clusterEvalQ when using doParallel, for example.