aggregating multiple columns in data.table

10,835

this is actually what i was looking for and is mentioned in the FAQ:

dtb[,lapply(.SD,mean),by="id"]
Share:
10,835

Related videos on Youtube

Alex
Author by

Alex

Updated on June 07, 2022

Comments

  • Alex
    Alex almost 2 years

    I have the following sample data.table:

    dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
    

    I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums, for example. What is the correct way to do this? The following does not work:

     dtb[,colSums, by="id"]
    

    This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name

  • Christoph_J
    Christoph_J almost 12 years
    +1 These, you are completely right, this is definitely the better way. I always think that I should have everything in long format, but quite often, as in this case, doing the computations is more efficient. You should mark yours as the correct answer.
  • Matt Dowle
    Matt Dowle over 11 years
    +1 Btw, this syntax has been optimized in the latest v1.8.2. There used to be a speed penalty of using lapply in j like that, but no longer. Revisions to data.table wiki points 1 have been made, and to point 5 are needed.