Aggregating Data in R with user defined function

12,976

You can order the data, get the top 5 entries (using head) and then apply the mean:

aggregate(x$a, by=list(x$b,x$c),FUN= function(x) mean(head(x[order(-x)], 5)))
#  Group.1 Group.2    x
#1       G       3 22.4

If you want to do this with a custom function, I would do it like this:

myfunc <- function(vec, n){
  mean(head(vec[order(-vec)], n))
}

aggregate(x$a, by=list(x$b,x$c),FUN= function(z) myfunc(z, 5))
#  Group.1 Group.2    x
#1       G       3 22.4

I actually prefer using the formula style in aggregate which would look like this (I also use with() to be able to refer to the column names directly without using x$ each time):

with(x, aggregate(a ~ b + c, FUN= function(z) myfunc(z, 5)))
#  b c    a
#1 G 3 22.4

In this function, the parameter z is passed each a-vector based on groups of b and c. Does that make more sense now? Also note that it doesn't return an integer here but a numeric (decimal, 22.4 in this case) value.

Share:
12,976
user3812709
Author by

user3812709

Updated on June 04, 2022

Comments

  • user3812709
    user3812709 almost 2 years

    I have grouped data in R using the aggregate method.

    Avg=aggregate(x$a, by=list(x$b,x$c),FUN= mean)
    

    This gives me the mean for all the values of 'a' grouped by 'b' and 'c' of data frame 'x'.

    Now instead of taking the average of all values of 'a' I want to take the average of 3 maximum values of 'a' grouped by 'b' and 'c' .

    Sample data set

    a    b    c
    10   G    3 
    20   G    3 
    22   G    3
    10   G    3 
    15   G    3
    25   G    3
    30   G    3
    

    After above Aggregate function it will give me

    Group.1    Group.2    x
      G          3       18.85
    

    But I want to take just the maximum 5 values of 'a' for average

    Group.1    Group.2    x
      G          3       22.40
    

    I am not able to accommodate the below maximum function that i am using in the Agrregate function

    index <- order(vector, decreasing = T)[1:5]
    vector(index)
    

    Can please anyone throw some light on how is this possible ?

  • user3812709
    user3812709 over 9 years
    the 'x' used in function and head is which 'x'?
  • talat
    talat over 9 years
    Oh, that wasn't a good choice for a name. Try to replace the last part of the function with ..function(z) myfunc(z, 5)) does that work? I'm currently not at my computer.
  • user3812709
    user3812709 over 9 years
    So the first argument in the function would be the vector which in my case would be 'x$a' right ? If yes , its is showing an error that argument 1 is not a vector . Thanks
  • user3812709
    user3812709 over 9 years
    Also I think the function will return integers where as I also want numbers to be up to 3 decimal places , and thats the reason I was using the index thing at start
  • talat
    talat over 9 years
    @user3812709, see my update. The function is not restricted to integers - and it doesn't return an integer for the sample data, as you can see (it's 22.4).
  • talat
    talat over 9 years
    @user3812709, do you still get an error message? It works with the sample data. If you're still getting some error, try to provide a sample of the data that's causing it.
  • user3812709
    user3812709 over 9 years
    Its working for me now . Thanks . Just wanted to confirm that looking at the sample data above the first argument of the function will be 'x$a' right ?