Summing rows by month in R

21,637

Solution 1

I create the data set by

data <- read.table( text="   Date    Hour    Melbourne   Southern    Flagstaff
                       1   2009-05-01  0   0   5   17
                       2   2009-05-01  2   0   2   1
                       3   2009-05-01  1   0   11  0
                       4   2009-05-01  3   0   3   8
                       5   2009-05-01  4   0   1   0
                       6   2009-05-01  5   0   49  79
                       7   2009-05-01  6   0   425 610",
                    header=TRUE,stringsAsFactors=FALSE)

You can do the summation with the function aggregate:

byday <- aggregate(cbind(Melbourne,Southern,Flagstaff)~Date,
             data=data,FUN=sum)
library(lubridate)
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month(Date),
             data=data,FUN=sum)

Look at ?aggregate to understand the function better. Starting with the last argument (because that makes explaining easier) the arguments do the following:

  • FUN is the function that should be used for the aggregation. I use sum to sum up the values, but i could also be mean, max or some function you wrote yourself.
  • data is used to indicate that data frame that I want to aggregate.
  • The first argument tells the function what exactly I want to aggregate. On the left side of the ~, I indicate the variables I want to aggregate. If there is more than one, they are combined with cbind. On the right hand side is the variable by which the data should be split. Putting Date means that aggregate will sum up the variables for each distinct value of Date.

For the aggregation by month, I used the function month from the package lubridate. It does what one expects: it returns a numeric value indicating the month for a given date. Maybe you first need to install the package by install.packages("lubridate").

If you prefer not to use lubridate, you could do the following instead:

data <- transform(data,month=as.numeric(format(as.Date(Date),"%m")))
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month,
                     data=data,FUN=sum)

Here I added a new column to data that contains the month and then aggregated by that column.

Solution 2

This could be another way to do this using data.table

library(data.table)
# Edited as per Arun's comment
out = setDT(data)[, lapply(.SD, sum), by=Date] 

#>out
#         Date Hour Melbourne Southern Flagstaff
#1: 2009-05-01   21         0      496       715

or by using dplyr

library(dplyr)
out = data %>% group_by(Date) %>% summarise_each(funs(sum))

#>out
#Source: local data frame [1 x 5]
#        Date Hour Melbourne Southern Flagstaff
#1 2009-05-01   21         0      496       715

Solution 3

Another base R solution

# to sum by date
rowsum(dat[-1], dat$Date)
#           Hour Melbourne Southern Flagstaff
#2009-05-01   21         0      496       715

# or by month and year
rowsum(dat[-1], format(dat$Date, "%b-%y") )
#       Hour Melbourne Southern Flagstaff
#May-09   21         0      496       715
Share:
21,637
user2787386
Author by

user2787386

University student

Updated on September 21, 2020

Comments

  • user2787386
    user2787386 over 3 years

    So I have a data frame that has a date column, an hour column and a series of other numerical columns. Each row in the data frame is 1 hour of 1 day for an entire year.

    The data frame looks like this:

              Date  Hour  Melbourne  Southern  Flagstaff
    1   2009-05-01     0          0         5         17
    2   2009-05-01     2          0         2          1
    3   2009-05-01     1          0        11          0
    4   2009-05-01     3          0         3          8
    5   2009-05-01     4          0         1          0
    6   2009-05-01     5          0        49         79
    7   2009-05-01     6          0       425        610
    

    The hours are out of order because this is subsetted from another data frame.

    I would like to sum the values in the numerical columns by month and possibly by day. Does anyone know how I can do this?

  • user2787386
    user2787386 almost 9 years
    I'm getting an error on the library(lubridate) line. Doi need to manually import the package?
  • Marta Cz-C
    Marta Cz-C almost 9 years
    Have you installed it first? install.packages("lubridate")
  • Stibu
    Stibu almost 9 years
    Sorry about that... Yes, you have to install the package as described by @Marta Cz-C.
  • Stibu
    Stibu almost 9 years
    I added a solution that does not rely on lubridate.
  • Arun
    Arun almost 9 years
    The equivalent of your dplyr solution in data.table is just: setDT(data)[, lapply(.SD, sum), by=Date]
  • Veerendra Gadekar
    Veerendra Gadekar almost 9 years
    Yes indeed!, thanks Arun. I will make the changes now.
  • user2787386
    user2787386 almost 9 years
    Worked perfectly. Thank you kindly.