select columns based on multiple strings with dplyr contains()

39,059

Solution 1

You can use matches

 mtcars %>%
        select(matches('m|ar')) %>%
        head(2)
 #              mpg am gear carb
 #Mazda RX4      21  1    4    4
 #Mazda RX4 Wag  21  1    4    4

According to the ?select documentation

‘matches(x, ignore.case = TRUE)’: selects all variables whose name matches the regular expression ‘x’

Though contains work with a single string

mtcars %>% 
       select(contains('m'))

Solution 2

You can use contains from package dplyr, if you give a vector of text options, like this:

mtcars %>% 
       select(contains(c("m", "ar"))

Solution 3

You could still use grepl() from base R.

df <- mtcars[ , grepl('m|ar', names(mtcars))]

...which returns a subset dataframe, df, containing columns with m or ar in the column names

Share:
39,059
agenis
Author by

agenis

Gitlab blogging page (FR) here contact: [email protected] Check out some of my wonderful data science T-shirts there Scientific publications: there

Updated on July 09, 2022

Comments

  • agenis
    agenis almost 2 years

    I want to select multiple columns based on their names with a regex expression. I am trying to do it with the piping syntax of the dplyr package. I checked the other topics, but only found answers about a single string.

    With base R:

    library(dplyr)    
    mtcars[grepl('m|ar', names(mtcars))]
    ###                      mpg am gear carb
    ### Mazda RX4           21.0  1    4    4
    ### Mazda RX4 Wag       21.0  1    4    4
    

    However it doesn't work with the select/contains way:

    mtcars %>% select(contains('m|ar'))
    ### data frame with 0 columns and 32 rows
    

    What's wrong?

  • agenis
    agenis about 9 years
    Thank you @akrun, i feel stupid now :-). But one question, still: given that, why should we even use contains(), if matches() does the same and even better?
  • akrun
    akrun about 9 years
    @agenis There are several options in ?select for flexibility of use, I guess. contains take a single string, but when you do this regex type matching, it is better to use matches...
  • hadley
    hadley about 9 years
    @agenis Because you might want to match "." and not have to think about how to escape it in a regular expression
  • Michael Bellhouse
    Michael Bellhouse about 7 years
    Is there a way to not have to pipe the matches, suppose I have a character vector of 30 different matches I am looking for, how can I read that in?
  • akrun
    akrun about 7 years
    @MichaelBellhouse In that case you use paste ie. paste(yourvec, collapse="|") and use that in matches
  • Michael Bellhouse
    Michael Bellhouse about 7 years
    akrun, thank you so much. I;ve been doing a lot of digging and experimenting for this. All the best.
  • Michael Bellhouse
    Michael Bellhouse about 7 years
    equivalent_for_filter <- df %>% filter(!grepl(paste(exclude_filter, collapse="|"),variable))
  • Ömer An
    Ömer An over 5 years
    use matches('m*.ar') for "AND" operator
  • akrun
    akrun over 3 years
    @titeuf it is a regex code to check either 'm' or 'ar'. If you want both use the code as stated by OmerAn
  • Admin
    Admin over 2 years
    Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.