risks of using setwd() in a script?

10,612

Solution 1

It's an issue of reproducible code. If you specify a directory that doesn't exist on someone else's computer, then they can't use your code. This is particularly bad with absolute file paths, and particularly bad with Windows file paths (which are absolutely impossible to replicate on a Unix system).

My preferred solution is to specify that the user should be in the relevant directory on their own system before starting to run the code. If for your own convenience you want to put a setwd(...) right at the top of your code, where other people can notice it and comment it out as appropriate, but the rest of your code assumes only relative paths from that starting directory, that's OK with me.

Yihui Xie (author of knitr) feels particularly strongly about this:

https://groups.google.com/forum/?fromgroups=#!topic/knitr/knM0VWoexT0

Whenever you want to manipulate files, they are assumed to be under the same directory of your source (e.g. Rnw documents). Then you can always use relative paths and you will never need to setwd(). Using setwd() contradicts with the principle of reproducibility, e.g. you use setwd('foo/bar/') and the directory may not exist in other people's computers. See FAQ 7: https://github.com/yihui/knitr/blob/master/FAQ.md

And from the aforementioned FAQ 7:

You'd better not do this [change working directory inside knitr code chunks]. Your working directory is always getwd() (all output files will be written here), but the code chunks are evaluated under the directory where your input document comes from. Changing working directories while running R code is a bad practice in general. See #38 for a discussion. You should also try to avoid absolute directories whenever possible (use relative directories instead), because it makes things less reproducible.

See also: https://github.com/yihui/knitr/issues/38

Solution 2

I can't think of any particular issues with using setwd() in a script run on a server I manage as it does return an error which can be trapped with try(), and you can manage it. I have used setwd() when being lazy about paths - see below!

I use file.path() extensively in scripts production or otherwise. Working across the files in an input directory and putting the output graphics and reports elsewhere. So something along the lines of... (untested) This would be a bit tedious using setwd().

kInDir <- '~/Indir'
kOutDir <- '~/Outdir'
flist <- dir(path=kInDir, pattern='^[a-z]{2,5}\\.csv$')
# note I could have used full.names=T - but it's easier not to...
for (fnam in flist) {
  # full path to the report file created
  sfnam <- file.path(kOutDir, gsub('.csv', '_report.txt', fnam))
  # full path to the csv file that will be created
  ofnam <- file.path(kOutDir, gsub('.csv', '_b.csv', fnam))
  #
  # ok... we're going to process this CSV file...
  r1 <- read.csv(file.path(kInDir, fnam))
  #
  # we''ll put the output from the analysis into this report file
  sink(sfnam, split=TRUE)
  # processs it... into a new data.frame k1
  # blah blah blah...
  #
  write.csv(k1, file=ofnam, row.names=FALSE)
  sink() # turn off this particular report file
}

Solution 3

Toward the better alternatives question:

I mainly use R for individual projects (meaning I'm the primary analyst). However, we do use these in projects which sometimes need to be shared with others.

RStudio - Projects

I have found RStudio's Projects functionality goes a long way to keeping your files organized. If other users also adopt RStudio, they will have the nice feeling of being able to open a single file ("*.Rproj") and have the project load in the same state you last saved it to.

ProjectTemplate

On top of this, I've found a new tool, ProjectTemplate that goes a step further! The technique the author developed is used to provide structure to what you are doing. Please go over to the website for more detail.

Solution 4

Though problems with setwd() have been targeted, I would like to add one more to the what are the alternatives part of the question. We often work with git where the relative path is very convenient

setrelwd <- function(rel_path){
  curr_dir <- getwd()
  abs_path <- file.path(curr_dir,rel_path)
  if(dir.exists(abs_path)){
    setwd(abs_path)
  }
  else
  {
    warning('Directory does not exist. Please create it first.')
  }

}

> setrelwd("Summer2016")
Warning message:
In setrelwd("Summer2016") : Directory does not exist. Please create it first.

Also if you don't want to see the warning message but create a folder right away see Check existence of directory and create if doesn't exist

Solution 5

I personally added the following code. I use Sys.info() and any() with unique information.

First step is to use Sys.info() and find the unique identifier for your computer.

if(any(Sys.info() == "COMPUTER1")) {
  setwd("c:/Users/user1/repos/project/")
}

if(any(Sys.info() == "COMPUTER2")) {
  setwd("home/user1/repos/project/")
}

and just add the name of the computer to the if statement and add the correct path. Just add a new if for each machine.

For reproduction it does not change anyone's working directory unless they are that specific user.

Share:
10,612

Related videos on Youtube

Ricardo Saporta
Author by

Ricardo Saporta

Updated on June 06, 2022

Comments

  • Ricardo Saporta
    Ricardo Saporta almost 2 years

    I've heard it said that it is bad practice to use setwd() in a script.

    • What are the risks/dangers associated with it?
    • What are better alternatives?
    • Anthony Damico
      Anthony Damico over 11 years
      maybe in a script you share with others.. because it might not exist depending on the OS. storing files in a tempdir() would be an alternative
  • Ben Bolker
    Ben Bolker over 11 years
    that's fine if you only ever want to exchange R code with the same group of people ...
  • Ben Bolker
    Ben Bolker over 11 years
    +1 for having your base directories defined at the top of the script.
  • hadley
    hadley over 11 years
    One of the things that makes working in this style is source(chdir = T).
  • Sean
    Sean over 11 years
    According to the R documentation for file.path() "The implementation is designed to be fast (faster than paste) as this function is used extensively in R itself."
  • hadley
    hadley over 11 years
    @Sean for me, paste takes 2 µs and file.path takes 1.5 µs in this case. So if you're running this script a million times, you'll save half a second.