R group by date, and summarize the values

69,899

Solution 1

Use as.Date() then aggregate().

energy$Date <- as.Date(energy$Datetime)
aggregate(energy$value, by=list(energy$Date), sum)

EDIT

Emma made a good point about column names. You can preserve column names in aggregate by using the following instead:

aggregate(energy["value"], by=energy["Date"], sum)

Solution 2

Using the tidyverse, specifically lubridate and dplyr:

library(lubridate)
library(tidyverse)

set.seed(10)
df <- tibble(Datetime = sample(seq(as.POSIXct("2015-04-27"), as.POSIXct("2015-04-29"), by = "min"), 10),
            value = sample(1:100, 10)) %>%
  arrange(Datetime)

df
#> # A tibble: 10 x 2
#>    Datetime            value
#>    <dttm>              <int>
#>  1 2015-04-27 04:04:00    35
#>  2 2015-04-27 10:48:00    41
#>  3 2015-04-27 13:02:00    25
#>  4 2015-04-27 13:09:00     5
#>  5 2015-04-27 14:43:00    57
#>  6 2015-04-27 20:29:00    12
#>  7 2015-04-27 20:34:00    77
#>  8 2015-04-28 00:22:00    66
#>  9 2015-04-28 05:29:00    37
#> 10 2015-04-28 09:14:00    58

df %>%
  mutate(date_col = date(Datetime)) %>%
  group_by(date_col) %>%
  summarize(value = sum(value))
#> # A tibble: 2 x 2
#>   date_col   value
#>   <date>     <int>
#> 1 2015-04-27   252
#> 2 2015-04-28   161

Created on 2018-08-01 by the reprex package (v0.2.0).

Solution 3

using data.table

Test$Datetime <- as.Date(Test$Datetime)
DT<- data.table(Test )
DT[,sum(value),by = Datetime]

     Datetime   V1
1: 2015-04-27 46.1
2: 2015-04-28  3.0
Share:
69,899

Related videos on Youtube

Nakx
Author by

Nakx

Trying to improve my understanding of statistics. Usually doing glmms, survival or time-series analyses, but struggling with the concepts behind. I am a devoted R user and I am quite good at programming and handling extremely large datasets with R. I am here to improve the way I am doing statistics and I hope report robust analyses in my papers.

Updated on July 09, 2022

Comments

  • Nakx
    Nakx over 1 year

    R is new for me and I am working with a (private) data set.

    I have the following problem, I have a lot of time series:

    2015-04-27  12:29:48
    2015-04-27  12:31:48
    2015-04-27  12:34:50
    2015-04-27  12:50:43
    2015-04-27  12:53:55
    2015-04-28  00:00:00
    2015-04-28  00:00:10
    

    All the timeseries have a value:

    Datetime                   value
    2015-04-27  12:29:48       0.0 
    2015-04-27  12:31:48       0.0
    2015-04-27  12:34:50       1.1
    2015-04-27  12:50:43      45.0 
    2015-04-27  12:53:55       0.0
    2015-04-28  00:00:00       1.0
    2015-04-28  00:00:10       2.0
    

    I want to skip all the hours and minutes, and sum it all together like this:

    Datetime      value
    2015-04-27    46.1
    2015-04-28     3.0
    

    The first thing i did was transform the column datetime:

    energy$datetime <- as.POSIXlt(energy$datetime)  
    

    I tried the summarize function:

    df %>% group_by(energy$datetime) %>% summarize (energy$newname(energy$value))
    

    But that isn't working.

    I also read competitive stuff on the internet (e.g.: http://r.789695.n4.nabble.com/How-to-sum-and-group-data-by-DATE-in-data-frame-td903708.html) but it doesn't make sense to me.

    How can I fix this issue?

Related