R colSums By Group
Solution 1
First we set up the matrix used as input.
Lines <- "1 2 3 4 5
1950 7 20 21 15 61
1951 2 10 6 26 57
1952 12 27 43 37 34
1953 14 16 40 47 94
1954 2 17 62 113 101
1955 3 4 43 99 148
1956 2 47 31 85 79
1957 17 5 38 216 228
1958 11 20 15 76 68
1959 16 20 43 30 226
1960 9 28 28 70 201
1961 1 31 124 74 137
1962 12 25 37 41 200 "
DF <- read.table(text = Lines, check.names = FALSE)
m <- as.matrix(DF)
Now, below, we show some alternative solutions. (1) seems the most flexible in that we can easily replace sum
with other functions to get different effects but (2) is the shortest for this particular problem. Also note that there are some slight differences. (1) produces a data.frame while the other two produce a matrix.
1) aggregate
decade <- 10 * as.numeric(rownames(m)) %/% 10
m.ag <- aggregate(m, data.frame(decade), sum)
which gives this data.frame:
> m.ag
decade 1 2 3 4 5
1 1950 86 186 342 744 1096
2 1960 22 84 189 185 538
2) rowsum
This one is shorter. It produces a matrix result.
rowsum(m, decade)
3) split/sapply
. This one produces a matrix as well. if we had DF
we could replace as.data.frame(m) with DF
shortening it slightly.
t(sapply(split(as.data.frame(m), decade), colSums))
EDIT: added solutions (2) and (3) Added some clarifications.
Solution 2
You first need to define a grouping variable, then you can use your tool of choice (aggregate
, ddply
, whatever).
> aggregate(x, by=list(trunc(as.numeric(rownames(x))/10)), sum)
Group.1 X1 X2 X3 X4 X5
1 195 86 186 342 744 1096
2 196 22 84 189 185 538
Solution 3
There might be a more elegant base R solution, but this works.
# Construct a nicely named vector with which to split your data.frame
breaks <- seq(1950, 2000, by=10)
names <- c("50's", "60's", "70's", "80's", "90's")
decade <- cut(as.numeric(row.names(df)),
seq(1950, 2000, by=10), labels=names, right=FALSE)
# by() splits df apart, operating on each of its pieces.
# do.call(rbind, ...) sutures the results back together.
do.call(rbind, by(df, decade, colSums))
# X1 X2 X3 X4 X5
# 50's 86 186 342 744 1096
# 60's 22 84 189 185 538
Solution 4
by
is an option:
by(x,10*(as.numeric(rownames(x))%/%10),colSums)
INDICES: 1950
1 2 3 4 5
86 186 342 744 1096
------------------------------------------------------------
INDICES: 1960
1 2 3 4 5
22 84 189 185 538
Related videos on Youtube
jitendra
Alum of NC State University. ex-Yahoo!, worked at Mist (a Juniper Company). Currently a Googler.
Updated on June 04, 2022Comments
-
jitendra almost 2 years
In the following matrix dataset:
1 2 3 4 5 1950 7 20 21 15 61 1951 2 10 6 26 57 1952 12 27 43 37 34 1953 14 16 40 47 94 1954 2 17 62 113 101 1955 3 4 43 99 148 1956 2 47 31 85 79 1957 17 5 38 216 228 1958 11 20 15 76 68 1959 16 20 43 30 226 1960 9 28 28 70 201 1961 1 31 124 74 137 1962 12 25 37 41 200
I have been trying to calculate colSums by decade i.e., find sum the each column from 1950-1959 and then from 1960-69 and so on.
I tried tapply, ddply, etc but couldn't figure out something that would actually work.
-
Josh O'Brien about 12 yearsThis is just what I meant by "more elegant". Very nice.
-
Josh O'Brien about 12 years+1 for
rowsum()
. Thanks for expanding your answer to include those additional options.