Run multiple R-scripts simultaneously

54,239

Solution 1

EDIT: Given enhancements to RStudio, this method is no longer the best way to do this - see Tom Kelly's answer below above :)


Assuming that the results do not need to end up in the same environment, you can achieve this using RStudio projects: https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects

First create two separate projects. You can open both simultaneously, which will result in two rsessions. You can then open each script in each project, and execute each one separately. It is then on your OS to manage the core allocation.

Solution 2

In RStudio

If you right click on RStudio, you should be able to open several separate "sessions" of RStudio (whether or not you use Projects). By default these will use 1 core each.

Update (July 2018): RStudio v1.2.830-1 which is available as a Preview Release supports a "jobs" pane. This is dedicated to running R scripts in the background separate from the interactive R session:

  • Run any R script as a background job in a clean R session
  • Monitor progress and see script output in real time
  • Optionally give jobs your global environment when started, and export values back when complete

This will be available in RStudio version 1.2.

Running Scripts in the Terminal

If have several scripts that you know run without errors, I'd recommend running these on different parameters through the command-line:

RCMD script.R
RScript script.R
R --vanilla < script.R

Running in the background:

nohup Rscript script.R &

Here "&" runs the script in the background (it can be retrieved with fg, monitored with htop, and killed with kill <pid> or pkill rsession) and nohup saves the output in a file and continues to run if the terminal is closed.

Passing arguments to a script:

Rscript script.R 1 2 3

This will pass c(1, 2, 3) to R as the output of commandArgs() so a loop in bash can run multiple instances of Rscript with a bash loop:

for ii in 1 2 3
  do
  nohup Rscript script.R $ii &
  done

Running parallel code within R

You will often find that a particular step in your R script is slowing computations, may I suggest running parallel code within your R code rather than running them separately? I'd recommend the [snow package][1] for running loops in parallel in R. Generally, instead of use:

cl <- makeCluster(n)
# n = number of cores (I'd recommend one less than machine capacity)
clusterExport(cl, list=ls()) #export input data to all cores
output_list <- parLapply(cl, input_list, function(x) ... )
stopCluster() # close cluster when complete (particularly on shared machines)

Use this anywhere you would normally use a lapply function in R to run it in parallel. [1]: https://www.r-bloggers.com/quick-guide-to-parallel-r-with-snow/

Solution 3

You can achieve multicore parallelism (as explained here https://cran.r-project.org/web/packages/doMC/vignettes/gettingstartedMC.pdf) in the same session with the following code

if(Sys.info()["sysname"]=="Windows"){
  library(doParallel)
  cl<-makeCluster(numberOfCores)
  registerDoParallel(cl)
}else{
  library(doMC)
  registerDoMC(numberOfCores)
}
library(foreach)

someList<-list("file1","file2")
returnComputation <-
  foreach(x=someList) %dopar%{
    source(x)
  }


if(Sys.info()["sysname"]=="Windows") stopCluster(cl)

You will need still adapt your output.

Solution 4

All you need to do (assuming you use Unix/Linux) is run a R batch command and put it in the background. This will automatically allocate it to a CPU.

At the shell, do:

/your/path/$ nohup R CMD BATCH --no-restore my_model1.R &
/your/path/$ nohup R CMD BATCH --no-restore my_model2.R &
/your/path/$ nohup R CMD BATCH --no-restore my_model3.R &
/your/path/$ nohup R CMD BATCH --no-restore my_model4.R &

executes the commands, will save the printout in the file my_model1.Rout,and saves all created R objects in the file.RData. This will run each model on a different CPU. The run of the session and output will be put in the output files.

In case of you doing it over the Internet, via a terminal, you will need to use the nohup command. Otherwise, upon exiting the session, the processes will terminate.

/your/path/$ nohup R CMD BATCH --no-restore my_model1.R &

If you want to give processes a low priority, you do:

/your/path/$ nohup nice -n 19 R CMD BATCH --no-restore my_model.R &

You'd do best to include some code at the beginning of the script to load and attach the relevant data file.

NEVER do simply

/your/path/$ nohup R CMD BATCH my_model1.R &

This will slurp the .RData file (all the funny objects there too), and will seriously compromise reproducibility. That is to say,

--no-restore

or

--vanilla

are your dear friends.

If you have too many models, I suggest doing computation on a cloud account, because you can have more CPU and RAM. Depending on what you are doing, and the R package, models can take hours on current hardware.

I've learned this the hard way, but there's a nice document here:

http://users.stat.umn.edu/~geyer/parallel/parallel.pdf

HTH.

Solution 5

If you want to do an embarrassing parallel, you can open as many as terminals you want in terminal tab (located just after Console tab) and run your code via using Rscript yourcode.R. Each code will be run on separated core by default. You can also use command line argument (as @Tom Kelly mentioned) if needed.

Share:
54,239
Acarbalacar
Author by

Acarbalacar

Updated on November 25, 2021

Comments

  • Acarbalacar
    Acarbalacar over 2 years

    In my thesis I need to perform a lot of simulation studies, which all takes quite a while. My computer has 4 cores, so I have been wondering if it is possible to run for example two R-scripts in Rstudio at the same time, by letting them use two different cores? If this could be done, I could be saving a lot of time by just leaving the computer over night running all these scripts.

  • michalrudko
    michalrudko about 6 years
    unfortunately, this does not seem to be supported on Windows (according to the cited docs)
  • Tom Kelly
    Tom Kelly about 5 years
    This has gotten a lot of votes and edit suggestions so to clarify, I recommend to do this with the command-line either running with nohup or in parallel. If you must use RStudio you should updated it and use the jobs function rather than opening a separate sessions. RStudio is great for interactive running and developing scripts but scripts do not need to be run this way unless they expect interactive inputs.
  • Tom Kelly
    Tom Kelly about 5 years
    It's not necessary to open a separate terminal session. You can submit multiple jobs to the background by ending a line with & or using Ctrl+z and then running bg to resume the job and then disown %1. I recommend using nohup so that background jobs will continue to run if the terminal is closed (or remote session is dropped).
  • Antonio
    Antonio over 4 years
    I have to run 7 different loops like this one: d2 <- sapply(1:dim(m)[1], function(k) {z <- t(apply(m, 1, function(x) m[k,]-x)); diag(z) <- 0; z[z<0] <- 0; apply(t(apply(z, 1, function(x) x*b)),1,function(x) mean(x[x>0]))}). They have just given me a powerful computer with 10 cores but I can't get them to work at the same time. I tried running different sessions of R but I keep having 70% unused RAM. Do you think I can run each of this loops on a different core? Any help would be much appreciated
  • Tom Kelly
    Tom Kelly over 4 years
    @Antonio I presume if they're "embarassingly parallel" then each "job" will use a separate thread (i.e., run on a separate core if available). I prefer to run parallel jobs on the command-line but I've included in here for completeness and because the question specifically asked about RStudio (not R). For your question I think "snow" is better for this but you can get more info on "jobs" here: blog.rstudio.com/2019/03/14/rstudio-1-2-jobs
  • API
    API about 4 years
    Among all the answers to the question, this one is the most well explained one, IMHO. Just want to add one small point: after each command above, I need to type 'exit' in the terminal to get back to the prompt sign to run the next 'nohup R CMD BATCH'. And the command 'jobs -l' could be helpful whenever I want to check the processes that are running.