How to pass dynamic column names in dplyr into custom function?
Solution 1
Using the latest version of dplyr (>=0.7), you can use the rlang
!!
(bang-bang) operator.
library(tidyverse)
from <- "Stand1971"
to <- "Stand1987"
data %>%
mutate(diff=(!!as.name(from))-(!!as.name(to)))
You just need to convert the strings to names with as.name
and then insert them into the expression. Unfortunately I seem to have to use a few more parenthesis than I would like, but the !!
operator seems to fall in a weird order-of-operations order.
Original answer, dplyr (0.3-<0.7):
From that vignette (vignette("nse","dplyr")
), use lazyeval's interp()
function
library(lazyeval)
from <- "Stand1971"
to <- "Stand1987"
data %>%
mutate_(diff=interp(~from - to, from=as.name(from), to=as.name(to)))
Solution 2
You can use .data
inside dplyr
chain now.
library(dplyr)
from <- "Stand1971"
to <- "Stand1987"
data %>% mutate(diff = .data[[from]] - .data[[to]])
Another option is to use sym
with bang-bang (!!
)
data %>% mutate(diff = !!sym(from) - !!sym(to))
In base R, we can use :
data$diff <- data[[from]] - data[[to]]
grssnbchr
Updated on April 02, 2021Comments
-
grssnbchr about 3 years
I have a dataset with the following structure:
Classes ‘tbl_df’ and 'data.frame': 10 obs. of 7 variables: $ GdeName : chr "Aeugst am Albis" "Aeugst am Albis" "Aeugst am Albis" "Aeugst am Albis" ... $ Partei : chr "BDP" "CSP" "CVP" "EDU" ... $ Stand1971: num NA NA 4.91 NA 3.21 ... $ Stand1975: num NA NA 5.389 0.438 4.536 ... $ Stand1979: num NA NA 6.2774 0.0195 3.4355 ... $ Stand1983: num NA NA 4.66 1.41 3.76 ... $ Stand1987: num NA NA 3.48 1.65 5.75 ...
I want to provide a function which allows to compute the difference between any value, and I would like to do this using
dplyr
smutate
function like so: (assume the parametersfrom
andto
are passed as arguments)from <- "Stand1971" to <- "Stand1987" data %>% mutate(diff = from - to)
Of course, this doesn't work, as
dplyr
uses non-standard evaluation. And I know there's now an elegant solution to the problem usingmutate_
, and I've read this vignette, but I still can't get my head around it.What to do?
Here's the first few rows of the dataset for a reproducible example
structure(list(GdeName = c("Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis", "Aeugst am Albis" ), Partei = c("BDP", "CSP", "CVP", "EDU", "EVP", "FDP", "FGA", "FPS", "GLP", "GPS"), Stand1971 = c(NA, NA, 4.907306434, NA, 3.2109535926, 18.272143463, NA, NA, NA, NA), Stand1975 = c(NA, NA, 5.389079711, 0.4382328556, 4.5363022622, 18.749259742, NA, NA, NA, NA), Stand1979 = c(NA, NA, 6.2773722628, 0.0194647202, 3.4355231144, 25.294403893, NA, NA, NA, 2.7055961071), Stand1983 = c(NA, NA, 4.6609804428, 1.412940467, 3.7563539244, 26.277246489, 0.8529335746, NA, NA, 2.601878177), Stand1987 = c(NA, NA, 3.4767860929, 1.6535933856, 5.7451770193, 22.146844746, NA, 3.7453183521, NA, 13.702211858 )), .Names = c("GdeName", "Partei", "Stand1971", "Stand1975", "Stand1979", "Stand1983", "Stand1987"), class = c("tbl_df", "data.frame" ), row.names = c(NA, -10L))