Subset dataframe with list of columns in R

21,721

Solution 1

This is basic R sytnax, perhaps you need to read the introductory manual

select.me <- c('v1','v2')
df[,select.me]

Solution 2

The great irony here is that when you said "I want to do this" the first expression should have succeeded,

df[,c('v1','v2')]
> str( df[,c('v1','v2')] )
'data.frame':   100 obs. of  2 variables:
 $ v1: num  -0.3347 0.2113 0.9775 -0.0151 -1.8544 ...
 $ v2: num  -1.396 -0.95 -1.254 0.822 0.141 ...

whereas all the later attempts would fail. I later realized that you didn't know that you could use select.me <- c('v1','v2') ; df[ , select.me]. You could also use these forms which might be safer in some instances:

df[ , names(df) %in% select.me] # logical indexing
df[ , grep(select.me, names(df) ) ]  # numeric indexing
df[ , grepl(select.me, names(df) ) ]  # logical indexing

Any of those can be used with negation( !logical ) or minus ( -numeric) to retrieve the complement, whereas you cannot use character indexing with negation. If you wanted to go down one level in understandability and were willing to change the select.me values to a valid R expression you could do this:

select.me <- "c('v1','v2')"
df[ , eval(parse(text=select.me)) ]

Not that I recommend this... just to let you know that such is possible after you "learn to walk". It would also have been possible (although rather baroque) using your original quoted string to pull out the information (although I think this just illustrates why your first version is superior):

select.me <- "'v1','v2'"
df [ , scan(textConnection(select.me), what="", sep=",") ]
> str( df [ , scan(textConnection(select.me), what="", sep=",") ] )
Read 2 items
'data.frame':   100 obs. of  2 variables:
 $ v1: num  -0.3347 0.2113 0.9775 -0.0151 -1.8544 ...
 $ v2: num  -1.396 -0.95 -1.254 0.822 0.141 ...
Share:
21,721
mike
Author by

mike

Updated on July 09, 2022

Comments

  • mike
    mike almost 2 years

    I want to select all columns in my dataframe which I have stored in a string variable. For example:

    v1 <- rnorm(100)
    v2 <- rnorm(100)
    v3 <- rnorm(100)
    df <- data.frame(v1,v2,v3)
    

    I want to accomplish the following:

    df[,c('v1','v2')]
    

    But I want to use a variable instead of (c('v1', 'v2'))(these all fail):

    select.me <- "'v1','v2'"
    df[,select.me]
    df[,c(select.me)]
    df[,c(paste(select.me,sep=''))]
    

    Thanks for help with a simple question,