Aggregating Data in R with user defined function
You can order the data, get the top 5 entries (using head) and then apply the mean:
aggregate(x$a, by=list(x$b,x$c),FUN= function(x) mean(head(x[order(-x)], 5)))
# Group.1 Group.2 x
#1 G 3 22.4
If you want to do this with a custom function, I would do it like this:
myfunc <- function(vec, n){
mean(head(vec[order(-vec)], n))
}
aggregate(x$a, by=list(x$b,x$c),FUN= function(z) myfunc(z, 5))
# Group.1 Group.2 x
#1 G 3 22.4
I actually prefer using the formula style in aggregate
which would look like this (I also use with()
to be able to refer to the column names directly without using x$
each time):
with(x, aggregate(a ~ b + c, FUN= function(z) myfunc(z, 5)))
# b c a
#1 G 3 22.4
In this function, the parameter z
is passed each a
-vector based on groups of b
and c
. Does that make more sense now? Also note that it doesn't return an integer here but a numeric (decimal, 22.4 in this case) value.
user3812709
Updated on June 04, 2022Comments
-
user3812709 almost 2 years
I have grouped data in R using the aggregate method.
Avg=aggregate(x$a, by=list(x$b,x$c),FUN= mean)
This gives me the mean for all the values of 'a' grouped by 'b' and 'c' of data frame 'x'.
Now instead of taking the average of all values of 'a' I want to take the average of 3 maximum values of 'a' grouped by 'b' and 'c' .
Sample data set
a b c 10 G 3 20 G 3 22 G 3 10 G 3 15 G 3 25 G 3 30 G 3
After above Aggregate function it will give me
Group.1 Group.2 x G 3 18.85
But I want to take just the maximum 5 values of 'a' for average
Group.1 Group.2 x G 3 22.40
I am not able to accommodate the below maximum function that i am using in the Agrregate function
index <- order(vector, decreasing = T)[1:5] vector(index)
Can please anyone throw some light on how is this possible ?
-
user3812709 over 9 yearsthe 'x' used in function and head is which 'x'?
-
talat over 9 yearsOh, that wasn't a good choice for a name. Try to replace the last part of the function with
..function(z) myfunc(z, 5))
does that work? I'm currently not at my computer. -
user3812709 over 9 yearsSo the first argument in the function would be the vector which in my case would be 'x$a' right ? If yes , its is showing an error that argument 1 is not a vector . Thanks
-
user3812709 over 9 yearsAlso I think the function will return integers where as I also want numbers to be up to 3 decimal places , and thats the reason I was using the index thing at start
-
talat over 9 years@user3812709, see my update. The function is not restricted to integers - and it doesn't return an integer for the sample data, as you can see (it's 22.4).
-
talat over 9 years@user3812709, do you still get an error message? It works with the sample data. If you're still getting some error, try to provide a sample of the data that's causing it.
-
user3812709 over 9 yearsIts working for me now . Thanks . Just wanted to confirm that looking at the sample data above the first argument of the function will be 'x$a' right ?