select columns based on multiple strings with dplyr contains()
Solution 1
You can use matches
mtcars %>%
select(matches('m|ar')) %>%
head(2)
# mpg am gear carb
#Mazda RX4 21 1 4 4
#Mazda RX4 Wag 21 1 4 4
According to the ?select
documentation
‘matches(x, ignore.case = TRUE)’: selects all variables whose name matches the regular expression ‘x’
Though contains
work with a single string
mtcars %>%
select(contains('m'))
Solution 2
You can use contains
from package dplyr
, if you give a vector of text options, like this:
mtcars %>%
select(contains(c("m", "ar"))
Solution 3
You could still use grepl() from base R.
df <- mtcars[ , grepl('m|ar', names(mtcars))]
...which returns a subset dataframe, df
, containing columns with m
or ar
in the column names
agenis
Gitlab blogging page (FR) here contact: [email protected] Check out some of my wonderful data science T-shirts there Scientific publications: there
Updated on July 09, 2022Comments
-
agenis almost 2 years
I want to select multiple columns based on their names with a regex expression. I am trying to do it with the piping syntax of the
dplyr
package. I checked the other topics, but only found answers about a single string.With base R:
library(dplyr) mtcars[grepl('m|ar', names(mtcars))] ### mpg am gear carb ### Mazda RX4 21.0 1 4 4 ### Mazda RX4 Wag 21.0 1 4 4
However it doesn't work with the select/contains way:
mtcars %>% select(contains('m|ar')) ### data frame with 0 columns and 32 rows
What's wrong?
-
agenis about 9 yearsThank you @akrun, i feel stupid now :-). But one question, still: given that, why should we even use contains(), if matches() does the same and even better?
-
akrun about 9 years@agenis There are several options in
?select
for flexibility of use, I guess.contains
take a single string, but when you do this regex type matching, it is better to usematches
... -
hadley about 9 years@agenis Because you might want to match "." and not have to think about how to escape it in a regular expression
-
Michael Bellhouse about 7 yearsIs there a way to not have to pipe the matches, suppose I have a character vector of 30 different matches I am looking for, how can I read that in?
-
akrun about 7 years@MichaelBellhouse In that case you use
paste
ie.paste(yourvec, collapse="|")
and use that inmatches
-
Michael Bellhouse about 7 yearsakrun, thank you so much. I;ve been doing a lot of digging and experimenting for this. All the best.
-
Michael Bellhouse about 7 yearsequivalent_for_filter <- df %>% filter(!grepl(paste(exclude_filter, collapse="|"),variable))
-
Ömer An over 5 yearsuse
matches('m*.ar')
for "AND" operator -
akrun over 3 years@titeuf it is a regex code to check either 'm' or 'ar'. If you want both use the code as stated by OmerAn
-
Admin over 2 yearsYour answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.