Regular expressions (RegEx) and dplyr::filter()

regex r dplyr

38,006

Solution 1

You need to double check the documentations for grepl and filter.

For grep/grepl you have to also supply the vector that you want to check in (y in this case) and filter takes a logical vector (i.e. you need to use grepl). If you want to supply an index vector (from grep) you can use slice instead.

df %>% filter(!grepl("^1", y))

Or with an index derived from grep:

df %>% slice(grep("^1", y, invert = TRUE))

But you can also just use substr because you are only interested in the first character:

df %>% filter(substr(y, 1, 1) != 1)

Solution 2

With a combination of dplyrand stringr (to stay within the tidyverse), you could do :

df %>% filter(!str_detect(y, "^1"))

This works because str_detect returns a logical vector.

38,006

Author by

emehex

Updated on March 01, 2020

Comments

emehex about 4 years
I have a simple data frame that looks like this:
```
x <- c("aa", "aa", "aa", "bb", "cc", "cc", "cc")
y <- c(101, 102, 113, 201, 202, 344, 407)
df = data.frame(x, y)    

    x   y
1   aa  101
2   aa  102
3   aa  113
4   bb  201
5   cc  202
6   cc  344
7   cc  407
```
I would like to use a dplyr::filter() and a RegEx to filter out all the y observations that start with the number 1

I'm imagining that the code will look something like this:
```
df %>%
  filter(y != grep("^1")) 
```
But I am getting an Error in grep("^1") : argument "x" is missing, with no default