R colSums By Group

r plyr

10,531

Solution 1

First we set up the matrix used as input.

Lines <- "1  2   3   4   5  
1950   7 20  21  15  61  
1951   2 10   6  26  57  
1952  12 27  43  37  34  
1953  14 16  40  47  94  
1954   2 17  62 113 101  
1955   3  4  43  99 148  
1956   2 47  31  85  79  
1957  17  5  38 216 228  
1958  11 20  15  76  68  
1959  16 20  43  30 226  
1960   9 28  28  70 201  
1961   1 31 124  74 137  
1962  12 25  37  41 200  "
DF <- read.table(text = Lines, check.names = FALSE)
m <- as.matrix(DF)

Now, below, we show some alternative solutions. (1) seems the most flexible in that we can easily replace sum with other functions to get different effects but (2) is the shortest for this particular problem. Also note that there are some slight differences. (1) produces a data.frame while the other two produce a matrix.

1) aggregate

decade <- 10 * as.numeric(rownames(m)) %/% 10
m.ag <- aggregate(m, data.frame(decade), sum)

which gives this data.frame:

> m.ag
  decade  1   2   3   4    5
1   1950 86 186 342 744 1096
2   1960 22  84 189 185  538

2) rowsum This one is shorter. It produces a matrix result.

rowsum(m, decade)

3) split/sapply. This one produces a matrix as well. if we had DF we could replace as.data.frame(m) with DF shortening it slightly.

t(sapply(split(as.data.frame(m), decade), colSums))

EDIT: added solutions (2) and (3) Added some clarifications.

Solution 2

You first need to define a grouping variable, then you can use your tool of choice (aggregate, ddply, whatever).

> aggregate(x, by=list(trunc(as.numeric(rownames(x))/10)), sum)
  Group.1 X1  X2  X3  X4   X5
1     195 86 186 342 744 1096
2     196 22  84 189 185  538

Solution 3

There might be a more elegant base R solution, but this works.

# Construct a nicely named vector with which to split your data.frame
breaks <- seq(1950, 2000, by=10)
names <- c("50's", "60's", "70's", "80's", "90's")
decade <- cut(as.numeric(row.names(df)), 
              seq(1950, 2000, by=10), labels=names, right=FALSE)
# by() splits df apart, operating on each of its pieces.
# do.call(rbind, ...) sutures the results back together.
do.call(rbind, by(df, decade, colSums))
#      X1  X2  X3  X4   X5
# 50's 86 186 342 744 1096
# 60's 22  84 189 185  538

Solution 4

by is an option:

by(x,10*(as.numeric(rownames(x))%/%10),colSums)
INDICES: 1950
   1    2    3    4    5 
  86  186  342  744 1096 
------------------------------------------------------------ 
INDICES: 1960
  1   2   3   4   5 
 22  84 189 185 538

View more solutions

10,531

jitendra

Alum of NC State University. ex-Yahoo!, worked at Mist (a Juniper Company). Currently a Googler.

Updated on June 04, 2022

Comments

jitendra 12 months

In the following matrix dataset:

       1  2   3   4   5  
1950   7 20  21  15  61  
1951   2 10   6  26  57  
1952  12 27  43  37  34  
1953  14 16  40  47  94  
1954   2 17  62 113 101  
1955   3  4  43  99 148  
1956   2 47  31  85  79  
1957  17  5  38 216 228  
1958  11 20  15  76  68  
1959  16 20  43  30 226  
1960   9 28  28  70 201  
1961   1 31 124  74 137  
1962  12 25  37  41 200

I have been trying to calculate colSums by decade i.e., find sum the each column from 1950-1959 and then from 1960-69 and so on.

I tried tapply, ddply, etc but couldn't figure out something that would actually work.

Josh O'Brien over 11 years

This is just what I meant by "more elegant". Very nice.
Josh O'Brien over 11 years

+1 for rowsum(). Thanks for expanding your answer to include those additional options.