Regular expressions (RegEx) and dplyr::filter()

38,006

Solution 1

You need to double check the documentations for grepl and filter.

For grep/grepl you have to also supply the vector that you want to check in (y in this case) and filter takes a logical vector (i.e. you need to use grepl). If you want to supply an index vector (from grep) you can use slice instead.

df %>% filter(!grepl("^1", y))

Or with an index derived from grep:

df %>% slice(grep("^1", y, invert = TRUE))

But you can also just use substr because you are only interested in the first character:

df %>% filter(substr(y, 1, 1) != 1)

Solution 2

With a combination of dplyrand stringr (to stay within the tidyverse), you could do :

df %>% filter(!str_detect(y, "^1"))

This works because str_detect returns a logical vector.

Share:
38,006
emehex
Author by

emehex

Updated on March 01, 2020

Comments

  • emehex
    emehex about 4 years

    I have a simple data frame that looks like this:

    x <- c("aa", "aa", "aa", "bb", "cc", "cc", "cc")
    y <- c(101, 102, 113, 201, 202, 344, 407)
    df = data.frame(x, y)    
    
        x   y
    1   aa  101
    2   aa  102
    3   aa  113
    4   bb  201
    5   cc  202
    6   cc  344
    7   cc  407
    

    I would like to use a dplyr::filter() and a RegEx to filter out all the y observations that start with the number 1

    I'm imagining that the code will look something like this:

    df %>%
      filter(y != grep("^1")) 
    

    But I am getting an Error in grep("^1") : argument "x" is missing, with no default