R function to filter / subset (programatically) multiple values over one variable

11,212

We can use %in% if the number of elements to check is more than 1.

df[df$v2 %in% c('a', 'b'),]
#   v1 v2
#1  1  a
#2  2  b

Or if we use subset, the df$ can be removed

subset(df, v2 %in% c('a', 'b'))

Or the dplyr::filter

filter(df, v2 %in% c('a', 'b'))

This can be wrapped in a function

f1 <- function(dat, col, val){
 filter(dat, col %in%  val)
 }

f1(df, v2, c('a', 'b'))
#  v1 v2
#1  1  a
#2  2  b

If we need to use ==, we could loop the vector to compare in a list and use Reduce with |

df[Reduce(`|`, lapply(letters[1:2], `==`, df$v2)),]
Share:
11,212
jpinelo
Author by

jpinelo

Updated on August 07, 2022

Comments

  • jpinelo
    jpinelo over 1 year

    Is there a function that takes one dataset, one col, one operator, but several values to evaluate a condition?

    v1 <- c(1:3)
    v2 <- c("a", "b", "c")
    df <- data.frame(v1, v2)
    

    Options to subset (programmatically)

    result <- df[df$v2 == "a" | df$v2 == "b", ]
    result
    1  1  a
    2  2  b
    

    Or, for more robustness

    result1 <- df[ df[[2]] == "a" | df[[2]] == "b", ]
    result1
      v1 v2
    1  1  a
    2  2  b
    

    Alternatively, for easier syntax:

    library(dplyr)
    result2 <- filter(df, v2 == "a" | v2 == "b")
    result2
      v1 v2
    1  1  a
    2  2  b
    

    (Am I right to assume that I can safely use dplyr's filter() inside a function? )

    I did not include subset() above as it is known to be for interactive use only.

    In all the cases above, one has to repeat the condition (v2 == "a" | v2 == "b").

    I'm looking for a function to which I could pass a vector to the argument, like c("a", "b") because I would like to pass a large number of values, and automate the process.

    Such function could perhaps be something like:

    fun(df, col = v2, operator = "|", value = c("a", "b")

    Thank you

  • jpinelo
    jpinelo over 8 years
    thanks for that. It does solve the issue as it takes 1 or more elements. isn't the opposite of %in%, !%in% ? I think I've used it before. Any idea why it would throw an error inside a function? Thanks
  • akrun
    akrun over 8 years
    @jpinelo You may have to try filter(df, !v2 %in% c('a', 'b'))