R: how does a foreach loop find a function that should be invoked?
Solution 1
They behave differently because registerDoParallel
registers an mclapply
backend on Linux, while it registers a clusterApplyLB
backend on Windows. When using an mclapply
backend, there are essentially no data exporting issues, so it works on Linux. But with clusterApplyLB
, you can run into problems if foreach
doesn't auto-export the functions and data that are needed.
You can solve this problem by modifying FUN3
to export FUN
via the .export
option:
FUN3 <- function(a, b) {
foreach(i=1:3, .export='FUN') %dopar% FUN(i, a, b)
}
This solution works on both Linux and Windows, since .export
is ignored by the mclapply
backend.
As pointed out by Hong Ooi, you have an error in your use of clusterExport
, but I wouldn't use clusterExport
to solve the problem since it is backend specific.
Solution 2
In your clusterExport
call, remove the env=environment()
part. What you're doing is telling clusterExport
to look for your objects in a brand new environment, so naturally it doesn't find them.
Related videos on Youtube
user7417
Updated on June 06, 2022Comments
-
user7417 almost 2 years
I have problems when I use a foreach loop (using
%dopar%
) which invokes a self-defined function. There is not really a problem when I work with Linux, but when I use Windows the self-defined function cannot be found. It is hard to explain the problem in words, so I composed a small example to show it. Assume I have a collection of three simple functions, whereFUN2
(using%do%
) andFUN3
(using%dopar%
) invoke the first one (FUN
):FUN <- function(x,y,z) { x + y + z } FUN2 <- function(a, b) { foreach(i=1:3) %do% FUN(i, a, b) } FUN3 <- function(a, b) { foreach(i=1:3) %dopar% FUN(i, a, b) }
The functions are stored in a script called
foreach_testfunctions.R
. In another script (foreach.test
) I source these functions, uselibrary(doParallel)
and try to use the functions. First I do it with Linux and all works fine:source("foreach_testfunctions.R") a <- 2 b <- 3 library(doParallel) registerDoParallel() foreach(i=1:3) %do% FUN(i, a, b) ## works fine FUN2(a, b) ## works fine foreach(i=1:3) %dopar% FUN(i, a, b) ## works fine FUN3(a, b) ## works fine
Then I do it in Windows:
source("foreach_testfunctions.R") a <- 2 b <- 3 library(doParallel) cl <- makeCluster(3) registerDoParallel(cl) foreach(i=1:3) %do% FUN(i, a, b) ## works fine FUN2(a, b) ## works fine foreach(i=1:3) %dopar% FUN(i, a, b) ## works fine FUN3(a, b) ## does not work Error in FUN(i, a, b) : task 1 failed - "Could not find function "FUN""
Conclusion: (1) No problems with
%do%
. (2) Problems with%dopar%
when using Windows. I tried inserting the lineclusterExport(cl, varlist=c("FUN", "a", "b"), env=environment())
before the line that invokesFUN3
to make sure that the functionFUN
and the variables a and b are found in the proper environment, but the error remains.My questions: Why does Windows behave different than Linux although the code is identical (apart from the different
registerDoParallel
syntax)? How can I make sure that Windows does find functionFUN
when invoked via functionFUN3
? -
Antoine about 8 years+1 nice answer. What if
FUN
has not been previously sourced? (i.e., is not present in the working environment)? Is it possible to make it available insideforeach
by exporting the corresponding R file? (e.g., .export="path/to/FUN.R"
). In other words, does.export
work for files in addition to R objects? -
Antoine about 8 years@Steve Weston the reason is that in my case each worker needs to write to a
.txt
file, call an executable taking the file as input, and read back the output of the.exe
(which is another.txt
file). I was wondering whether.export
could make available (copy) the.exe
file to each worker' temporary working directory -
Steve Weston about 8 years@Antoine No, you can only specify the name of one or more variables with the foreach
.export
argument. Beyond that, I usually use backend-specific methods of initializing workers, such asclusterExport
orclusterEvalQ
when using doParallel, for example.