aggregating multiple columns in data.table
10,835
this is actually what i was looking for and is mentioned in the FAQ:
dtb[,lapply(.SD,mean),by="id"]
Related videos on Youtube
Author by
Alex
Updated on June 07, 2022Comments
-
Alex almost 2 years
I have the following sample
data.table
:dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10))
I would like to aggregate all columns (a and b, though they should be kept separate) by id using
colSums
, for example. What is the correct way to do this? The following does not work:dtb[,colSums, by="id"]
This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name
-
Christoph_J almost 12 years+1 These, you are completely right, this is definitely the better way. I always think that I should have everything in long format, but quite often, as in this case, doing the computations is more efficient. You should mark yours as the correct answer.
-
Matt Dowle over 11 years+1 Btw, this syntax has been optimized in the latest v1.8.2. There used to be a speed penalty of using
lapply
inj
like that, but no longer. Revisions todata.table
wiki points 1 have been made, and to point 5 are needed.