R function to filter / subset (programatically) multiple values over one variable
We can use %in%
if the number of elements to check is more than 1.
df[df$v2 %in% c('a', 'b'),]
# v1 v2
#1 1 a
#2 2 b
Or if we use subset
, the df$
can be removed
subset(df, v2 %in% c('a', 'b'))
Or the dplyr::filter
filter(df, v2 %in% c('a', 'b'))
This can be wrapped in a function
f1 <- function(dat, col, val){
filter(dat, col %in% val)
}
f1(df, v2, c('a', 'b'))
# v1 v2
#1 1 a
#2 2 b
If we need to use ==
, we could loop the vector
to compare in a list
and use Reduce
with |
df[Reduce(`|`, lapply(letters[1:2], `==`, df$v2)),]
jpinelo
Updated on August 07, 2022Comments
-
jpinelo over 1 year
Is there a function that takes one dataset, one col, one operator, but several values to evaluate a condition?
v1 <- c(1:3) v2 <- c("a", "b", "c") df <- data.frame(v1, v2)
Options to subset (programmatically)
result <- df[df$v2 == "a" | df$v2 == "b", ] result 1 1 a 2 2 b
Or, for more robustness
result1 <- df[ df[[2]] == "a" | df[[2]] == "b", ] result1 v1 v2 1 1 a 2 2 b
Alternatively, for easier syntax:
library(dplyr) result2 <- filter(df, v2 == "a" | v2 == "b") result2 v1 v2 1 1 a 2 2 b
(Am I right to assume that I can safely use dplyr's filter() inside a function? )
I did not include subset() above as it is known to be for interactive use only.
In all the cases above, one has to repeat the condition (
v2 == "a" | v2 == "b"
).I'm looking for a function to which I could pass a vector to the argument, like
c("a", "b")
because I would like to pass a large number of values, and automate the process.Such function could perhaps be something like:
fun(df, col = v2, operator = "|", value = c("a", "b")
Thank you
-
jpinelo over 8 yearsthanks for that. It does solve the issue as it takes 1 or more elements. isn't the opposite of %in%, !%in% ? I think I've used it before. Any idea why it would throw an error inside a function? Thanks
-
akrun over 8 years@jpinelo You may have to try
filter(df, !v2 %in% c('a', 'b'))