How can I use dplyr to apply a function to all non-group_by columns?
Solution 1
If you're willing to try out an experimental dplyr, you can try out the
new (and still experimental) summarise_each()
:
devtools::install_github("hadley/dplyr", ref = "colwise")
library(dplyr)
iris %.%
group_by(Species) %.%
summarise_each(funs(mean))
## Source: local data frame [3 x 5]
##
## Species Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 setosa 5.006 3.428 1.462 0.246
## 2 versicolor 5.936 2.770 4.260 1.326
## 3 virginica 6.588 2.974 5.552 2.026
iris %.%
group_by(Species) %.%
summarise_each(funs(min, max))
## Source: local data frame [3 x 9]
##
## Species Sepal.Length_min Sepal.Width_min Petal.Length_min
## 1 setosa 4.3 2.3 1.0
## 2 versicolor 4.9 2.0 3.0
## 3 virginica 4.9 2.2 4.5
## Variables not shown: Petal.Width_min (dbl), Sepal.Length_max (dbl),
## Sepal.Width_max (dbl), Petal.Length_max (dbl), Petal.Width_max (dbl)
Feedback much appreciated!
This will appear in dplyr 0.2.
Solution 2
This will get you almost all the way in dplyr
.
h = iris %.%
group_by(Species) %.%
do(function(d){
sapply(Filter(is.numeric, d), mean)
})
as.data.frame(h)
Related videos on Youtube
Comments
-
kmm almost 2 years
I'm trying to use the dplyr package to apply a function to all columns in a data.frame that are not being grouped, which I would do with
aggregate()
:aggregate(. ~ Species, data = iris, mean)
where
mean
is applied to all columns not used for grouping. (Yes, I know I can use aggregate, but I'm trying to understand dplyr.)I can use
summarize
like this:species <- group_by(iris, Species) summarize(species, Sepal.Length = mean(Sepal.Length), Sepal.Width = mean(Sepal.Width))
But is there a way to have
mean()
applied to all columns that are not grouped, similar to the. ~
notation ofaggregate()
? I have a data.frame with 30 columns that I want to aggregate, so writing out the individual statements is not ideal.-
BrodieG about 10 yearsSee this previous SO Q/A.
-
-
hadley about 10 yearsI wouldn't recommend using
do()
in that way, as it's likely to change in 0.2 -
Ramnath about 10 yearsIs there an idiomatic way to do it in
dplyr
? Indata.table
I can dodata.table(iris)[,lapply(.SD, mean),Species]
.