Summing rows by month in R
Solution 1
I create the data set by
data <- read.table( text=" Date Hour Melbourne Southern Flagstaff
1 2009-05-01 0 0 5 17
2 2009-05-01 2 0 2 1
3 2009-05-01 1 0 11 0
4 2009-05-01 3 0 3 8
5 2009-05-01 4 0 1 0
6 2009-05-01 5 0 49 79
7 2009-05-01 6 0 425 610",
header=TRUE,stringsAsFactors=FALSE)
You can do the summation with the function aggregate
:
byday <- aggregate(cbind(Melbourne,Southern,Flagstaff)~Date,
data=data,FUN=sum)
library(lubridate)
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month(Date),
data=data,FUN=sum)
Look at ?aggregate
to understand the function better. Starting with the last argument (because that makes explaining easier) the arguments do the following:
FUN
is the function that should be used for the aggregation. I usesum
to sum up the values, but i could also bemean
,max
or some function you wrote yourself.data
is used to indicate that data frame that I want to aggregate.- The first argument tells the function what exactly I want to aggregate. On the left side of the
~
, I indicate the variables I want to aggregate. If there is more than one, they are combined withcbind
. On the right hand side is the variable by which the data should be split. PuttingDate
means that aggregate will sum up the variables for each distinct value ofDate
.
For the aggregation by month, I used the function month
from the package lubridate
. It does what one expects: it returns a numeric value indicating the month for a given date. Maybe you first need to install the package by install.packages("lubridate")
.
If you prefer not to use lubridate, you could do the following instead:
data <- transform(data,month=as.numeric(format(as.Date(Date),"%m")))
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month,
data=data,FUN=sum)
Here I added a new column to data that contains the month and then aggregated by that column.
Solution 2
This could be another way to do this using data.table
library(data.table)
# Edited as per Arun's comment
out = setDT(data)[, lapply(.SD, sum), by=Date]
#>out
# Date Hour Melbourne Southern Flagstaff
#1: 2009-05-01 21 0 496 715
or by using dplyr
library(dplyr)
out = data %>% group_by(Date) %>% summarise_each(funs(sum))
#>out
#Source: local data frame [1 x 5]
# Date Hour Melbourne Southern Flagstaff
#1 2009-05-01 21 0 496 715
Solution 3
Another base R solution
# to sum by date
rowsum(dat[-1], dat$Date)
# Hour Melbourne Southern Flagstaff
#2009-05-01 21 0 496 715
# or by month and year
rowsum(dat[-1], format(dat$Date, "%b-%y") )
# Hour Melbourne Southern Flagstaff
#May-09 21 0 496 715
Comments
-
user2787386 over 3 years
So I have a data frame that has a date column, an hour column and a series of other numerical columns. Each row in the data frame is 1 hour of 1 day for an entire year.
The data frame looks like this:
Date Hour Melbourne Southern Flagstaff 1 2009-05-01 0 0 5 17 2 2009-05-01 2 0 2 1 3 2009-05-01 1 0 11 0 4 2009-05-01 3 0 3 8 5 2009-05-01 4 0 1 0 6 2009-05-01 5 0 49 79 7 2009-05-01 6 0 425 610
The hours are out of order because this is subsetted from another data frame.
I would like to sum the values in the numerical columns by month and possibly by day. Does anyone know how I can do this?
-
user2787386 almost 9 yearsI'm getting an error on the library(lubridate) line. Doi need to manually import the package?
-
Marta Cz-C almost 9 yearsHave you installed it first?
install.packages("lubridate")
-
Stibu almost 9 yearsSorry about that... Yes, you have to install the package as described by @Marta Cz-C.
-
Stibu almost 9 yearsI added a solution that does not rely on lubridate.
-
Arun almost 9 yearsThe equivalent of your dplyr solution in data.table is just:
setDT(data)[, lapply(.SD, sum), by=Date]
-
Veerendra Gadekar almost 9 yearsYes indeed!, thanks Arun. I will make the changes now.
-
user2787386 almost 9 yearsWorked perfectly. Thank you kindly.