dplyr filter on a vector rather than a dataframe in R
Solution 1
Sorry for posting on a 5-month-old question to archive a simpler solution.
Package dplyr
can filter character vectors in following ways:
> c("A", "B", "C", "D") %>% .[matches("[^AB]", vars=.)]
[1] "C" "D"
> c("A", "B", "C", "D") %>% .[.!="A"]
[1] "B" "C" "D"
The first approach allows you to filter with regular expression, and the second approach uses fewer words. It works because package dplyr
imports package magrittr
albeit masks its functions like extract
, but not the placeholder .
.
Details of placeholder .
can be found on within help of forward-pipe operator %>%
, and this placeholder has mainly three usage:
- Using the dot for secondary purposes
- Using lambda expressions with %>%
- Using the dot-place holder as lhs
Here we are taking advantage of its 3rd usage.
Solution 2
You may like to try magrittr::extract
. e.g.
> library(magrittr)
> c("A", "B", "C", "D") %>% extract(.!="A")
[1] "B" "C" "D"
For more extract
-like functions load magrittr
package and type ?alises
.
Solution 3
Pretty sure dplyr only really operates on data.frames. Here's a two line example coercing the vector to a data.frame and back.
myDf = data.frame(states = gsub(" ", "-", tolower(state.name))) %>% filter(states != "alaska")
all_states = myDf$states
or a gross one liner:
all_states = (data.frame(states = gsub(" ", "-", tolower(state.name))) %>% filter(states != "alaska"))$states
Related videos on Youtube
Canovice
Second year Stanford graduate student, studying Statistics and Data Science. Plan to make it big in sports analytics.
Updated on August 22, 2022Comments
-
Canovice over 1 year
This seems like a simple question, but I have not come across a clean solution for it yet. I have a vector in R and I want to remove certain elements from the vector, however I want to avoid the vector[vector != "thiselement"] notation for a variety of reasons. In particular, here is what I am trying to do:
# this doesnt work all_states = gsub(" ", "-", tolower(state.name)) %>% filter("alaska") # this doesnt work either all_states = gsub(" ", "-", tolower(state.name)) %>% filter(!= "alaska") # this does work but i want to avoid this approach to filtering all_states = gsub(" ", "-", tolower(state.name)) all_states = all_states[all_states != "alaska"]
can this be done in a simple manner? Thanks in advance for the help!
EDIT - the reason I'm struggling with this is because I'm only finding things online regarding filtering based on a column of a dataframe, for example:
my_df %>% filter(col != "alaska")
however I'm working with a vector not a dataframe here
-
Gregor Thomas almost 7 yearsThe
d
indplyr
is fordata.frame
. "using dplyr to write cleaner code" should mean usingdplyr
for what it's made for (data frames) and not trying to use it when inappropriate (not data frames).
-
-
Canovice almost 7 yearsgot it. yeah maybe im making my life harder than it needs to be. okay thanks
-
David Pedack almost 7 yearsyeah, it'd be nice to have 1 tool to use. dplyr ends up looking a lot cleaner than the base R code in my opinion. unfortunately it always ends up a mess with vectors.
-
Hielke Walinga about 4 yearsNo documentation for ‘alises’ in specified packages and libraries:
-
Łukasz Deryło about 4 yearsIt must've been removed in current version
?extract
works now. -
JelenaČuklina over 3 yearsvery unfortunate that
tidyr
extract means a totally different thing. Love pipes, and this vector function is awesome! -
Jens over 3 yearsIs there a way to negate the first approach with "matches" ?
-
Quar over 3 years@Jens, one could negate via either indexing, such as
c("A", "B", "C", "D") %>% .[-matches("[^AB]", vars=.)]
, or regular expression itself, such asc("A", "B", "C", "D") %>% .[matches("[AB]", vars=.)]
-- perhaps the caveats here is to prepend-
instead of!
to thematches
selected indices for the negation, becausematches
returns an integer vectorc(3, 4)
, rather than a boolean maskc(F, F, T, T)
. -
Jens over 3 yearsMany thanks! I tried ! and it did not work. Now I know why. -matches worked