Aggregate a dataframe on a given column and display another column
Solution 1
First, you split the data using split
:
split(z,z$Group)
Than, for each chunk, select the row with max Score:
lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),])
Finally reduce back to a data.frame do.call
ing rbind
:
do.call(rbind,lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),]))
Result:
Group Score Info
1 1 3 c
2 2 4 d
One line, no magic spells, fast, result has good names =)
Solution 2
A late answer, but and approach using data.table
library(data.table)
DT <- data.table(dat)
DT[, .SD[which.max(Score),], by = Group]
Or, if it is possible to have more than one equally highest score
DT[, .SD[which(Score == max(Score)),], by = Group]
Noting that (from ?data.table
.SD
is a data.table containing the Subset of x's Data for each group, excluding the group column(s)
Solution 3
To add to Gavin's answer: prior to the merge, it is possible to get aggregate to use proper names when not using the formula interface:
aggregate(data[,"score", drop=F], list(group=data$group), mean)
Solution 4
The plyr
package can be used for this. With the ddply()
function you can split a data frame on one or more columns and apply a function and return a data frame, then with the summarize()
function you can use the columns of the splitted data frame as variables to make the new data frame/;
dat <- read.table(textConnection('Group Score Info
1 1 1 a
2 1 2 b
3 1 3 c
4 2 4 d
5 2 3 e
6 2 1 f'))
library("plyr")
ddply(dat,.(Group),summarize,
Max = max(Score),
Info = Info[which.max(Score)])
Group Max Info
1 1 3 c
2 2 4 d
Solution 5
This is how I base
ically think of the problem.
my.df <- data.frame(group = rep(c(1,2), each = 3),
score = runif(6), info = letters[1:6])
my.agg <- with(my.df, aggregate(score, list(group), max))
my.df.split <- with(my.df, split(x = my.df, f = group))
my.agg$info <- unlist(lapply(my.df.split, FUN = function(x) {
x[which(x$score == max(x$score)), "info"]
}))
> my.agg
Group.1 x info
1 1 0.9344336 a
2 2 0.7699763 e
jul635
Updated on June 19, 2020Comments
-
jul635 almost 4 years
I have a dataframe in R of the following form:
> head(data) Group Score Info 1 1 1 a 2 1 2 b 3 1 3 c 4 2 4 d 5 2 3 e 6 2 1 f
I would like to aggregate it following the
Score
column using themax
function> aggregate(data$Score, list(data$Group), max) Group.1 x 1 1 3 2 2 4
But I also would like to display the
Info
column associated to the maximum value of theScore
column for each group. I have no idea how to do this. My desired output would be:Group.1 x y 1 1 3 c 2 2 4 d
Any hint?
-
Andrie almost 13 yearsNice, but I think you'll agree my solution is slightly more general since it will return all columns in the original data.frame.
-
Sacha Epskamp almost 13 yearsYeah indeed. You could change
x$Score==max
inwhich,max
if you assume that there is only one maximum per group. -
Andrie almost 13 yearsAh, that's nice. I didn't know about
which.max
-
hhh about 7 yearsHow can you calculate standard deviation only of the last six values, not of the whole column? Do you firstly need to aggregate the data or does there exist something like SQL-style
OVER (PARTITION ... BETWEEN 5 PRECEDING AND CURRENT ROW
in R? I am trying to answer this here but stuck. -
jangorecki almost 5 years@hhh this looks like rolling sd, please upvote github.com/Rdatatable/data.table/issues/2778