dplyr filter on a vector rather than a dataframe in R

14,583

Solution 1

Sorry for posting on a 5-month-old question to archive a simpler solution.

Package dplyr can filter character vectors in following ways:

> c("A", "B", "C", "D") %>% .[matches("[^AB]", vars=.)]
[1] "C" "D"
> c("A", "B", "C", "D") %>% .[.!="A"]
[1] "B" "C" "D"

The first approach allows you to filter with regular expression, and the second approach uses fewer words. It works because package dplyr imports package magrittr albeit masks its functions like extract, but not the placeholder ..

Details of placeholder . can be found on within help of forward-pipe operator %>%, and this placeholder has mainly three usage:

  • Using the dot for secondary purposes
  • Using lambda expressions with %>%
  • Using the dot-place holder as lhs

Here we are taking advantage of its 3rd usage.

Solution 2

You may like to try magrittr::extract. e.g.

> library(magrittr)

> c("A", "B", "C", "D") %>% extract(.!="A")
[1] "B" "C" "D"

For more extract-like functions load magrittr package and type ?alises.

Solution 3

Pretty sure dplyr only really operates on data.frames. Here's a two line example coercing the vector to a data.frame and back.

myDf = data.frame(states = gsub(" ", "-", tolower(state.name))) %>% filter(states != "alaska")
all_states = myDf$states

or a gross one liner:

all_states = (data.frame(states = gsub(" ", "-", tolower(state.name))) %>% filter(states != "alaska"))$states
Share:
14,583

Related videos on Youtube

Canovice
Author by

Canovice

Second year Stanford graduate student, studying Statistics and Data Science. Plan to make it big in sports analytics.

Updated on August 22, 2022

Comments

  • Canovice
    Canovice over 1 year

    This seems like a simple question, but I have not come across a clean solution for it yet. I have a vector in R and I want to remove certain elements from the vector, however I want to avoid the vector[vector != "thiselement"] notation for a variety of reasons. In particular, here is what I am trying to do:

    # this doesnt work
    all_states = gsub(" ", "-", tolower(state.name)) %>% filter("alaska")
    
    # this doesnt work either
    all_states = gsub(" ", "-", tolower(state.name)) %>% filter(!= "alaska")
    
    # this does work but i want to avoid this approach to filtering
    all_states = gsub(" ", "-", tolower(state.name))
    all_states = all_states[all_states != "alaska"]
    

    can this be done in a simple manner? Thanks in advance for the help!

    EDIT - the reason I'm struggling with this is because I'm only finding things online regarding filtering based on a column of a dataframe, for example:

    my_df %>% filter(col != "alaska")
    

    however I'm working with a vector not a dataframe here

    • Gregor Thomas
      Gregor Thomas almost 7 years
      The d in dplyr is for data.frame. "using dplyr to write cleaner code" should mean using dplyr for what it's made for (data frames) and not trying to use it when inappropriate (not data frames).
  • Canovice
    Canovice almost 7 years
    got it. yeah maybe im making my life harder than it needs to be. okay thanks
  • David Pedack
    David Pedack almost 7 years
    yeah, it'd be nice to have 1 tool to use. dplyr ends up looking a lot cleaner than the base R code in my opinion. unfortunately it always ends up a mess with vectors.
  • Hielke Walinga
    Hielke Walinga about 4 years
    No documentation for ‘alises’ in specified packages and libraries:
  • Łukasz Deryło
    Łukasz Deryło about 4 years
    It must've been removed in current version ?extract works now.
  • JelenaČuklina
    JelenaČuklina over 3 years
    very unfortunate that tidyr extract means a totally different thing. Love pipes, and this vector function is awesome!
  • Jens
    Jens over 3 years
    Is there a way to negate the first approach with "matches" ?
  • Quar
    Quar over 3 years
    @Jens, one could negate via either indexing, such as c("A", "B", "C", "D") %>% .[-matches("[^AB]", vars=.)] , or regular expression itself, such as c("A", "B", "C", "D") %>% .[matches("[AB]", vars=.)] -- perhaps the caveats here is to prepend - instead of ! to the matches selected indices for the negation, because matches returns an integer vector c(3, 4), rather than a boolean mask c(F, F, T, T).
  • Jens
    Jens over 3 years
    Many thanks! I tried ! and it did not work. Now I know why. -matches worked