In R how can I split a dataframe by date

12,891

Solution 1

say you have this data.frame :

    df <- data.frame(date=rep(seq.POSIXt(as.POSIXct("2010-01-01 15:26"), by="day", length.out=3), each=3), var=rnorm(9))
> df
                 date         var
1 2010-01-01 15:26:00 -0.02814237
2 2010-01-01 15:26:00 -0.26924825
3 2010-01-01 15:26:00 -0.57968310
4 2010-01-02 15:26:00  0.88089757
5 2010-01-02 15:26:00 -0.79954092
6 2010-01-02 15:26:00  1.87145778
7 2010-01-03 15:26:00  0.93234835
8 2010-01-03 15:26:00  1.29130038
9 2010-01-03 15:26:00 -1.09841234

to split by day you just need:

 > split(df, as.Date(df$date))
$`2010-01-01`
                 date         var
1 2010-01-01 15:26:00 -0.02814237
2 2010-01-01 15:26:00 -0.26924825
3 2010-01-01 15:26:00 -0.57968310

$`2010-01-02`
                 date        var
4 2010-01-02 15:26:00  0.8808976
5 2010-01-02 15:26:00 -0.7995409
6 2010-01-02 15:26:00  1.8714578

$`2010-01-03`
                 date        var
7 2010-01-03 15:26:00  0.9323484
8 2010-01-03 15:26:00  1.2913004
9 2010-01-03 15:26:00 -1.0984123

EDIT:

the above method is consistent with chron datetime object too:

x <- chron(dates = "02/27/92", times = "22:29:56")
> x
[1] (02/27/92 22:29:56)
> as.Date(x)
[1] "1992-02-27"

EDIT 2

making sure that as.Date doesn't change your data is crucial, see here:

# I'm using "DSTday" to make a sequece of one entire _apparent_ day
x <- rep(seq.POSIXt(as.POSIXct("2010-03-27 00:31"), by="DSTday", length.out=3))
> x
[1] "2010-03-27 00:31:00 GMT" "2010-03-28 00:31:00 GMT" "2010-03-29 00:31:00 BST"
> as.Date(x)
[1] "2010-03-27" "2010-03-28" "2010-03-28"

the third item is in the summer time and as.Date retrieve the actual day, i.e. minus one hour. To avoid this:

> as.Date(cut(x, "DSTday"))
[1] "2010-03-27" "2010-03-28" "2010-03-29"

Solution 2

The trick is to create a vector that tells R how to split the data. So in your example we have a data frame:

dd = data.frame(x = runif(100),data= paste0(1:4, "/05/13"))
##This step will depend on your data structure
dd$date = strptime(dd$data, "%d/%m/%y")

Note that I've made the date column have class POSIXlt`POSIXt`. This allows easy manipulation of dates.

Next I'll create the variable I'm going to split on - split_date. Basically, I subtract the minimum date from all other dates and divide by the number of seconds in a day:

split_date = (dd$date -min(dd$date))/86400

Since this will result in fractions, I'll round down to the nearest day:

split_date = floor(split_date)

Now I use the split function in the standard way:

split_by_day = split(dd, split_date)
Share:
12,891
Mark
Author by

Mark

Updated on June 05, 2022

Comments

  • Mark
    Mark almost 2 years

    I have a dataframe where one column is a date time (chron). I would like to split this dataframe into a list of dataframes split by the date part only. So each dataframe will have all the data for that day. I looked at split function but not sure how to use part of a column value?

  • Mark
    Mark almost 11 years
    Thanks for that, was hoping you could pass a function into split that got the date part as it split, but I guess not.
  • Mark
    Mark almost 11 years
    I have a date time though and I need to retain the time information.
  • Michele
    Michele almost 11 years
    @Mark just use as.Date or maybe you could post a sample to actually run the code against and so you'll see that my method works...
  • Michele
    Michele almost 11 years
    I think strptime(dd$data, "%d/%m/%Y") should be strptime(dd$data, "%d/%m/%y")
  • Michele
    Michele almost 11 years
    @csgillespie of course...there are seconds in my example. why you don't like this answer? it's the best practice.
  • Mark
    Mark almost 11 years
    @Michele Yes that looks better, all in one line also which usually is a good thing in R for speed.
  • csgillespie
    csgillespie almost 11 years
    @Michele Not sure where I said I didn't like this answer. Anyway +1
  • Michele
    Michele almost 11 years
    @Mark I'm glad you like it! obviously it's compatible with chron too, see my edit.
  • Michele
    Michele almost 11 years
    @csgillespie I didn't mean you nor anyone in particular. the answer was downvoted for 10 minutes, it was a 'plural you' :-)
  • Mark
    Mark almost 11 years
    @Michele I down voted it just because at first it was only valid for dates, however this seems this best answer now. Nothing personal strictly technical. :p
  • Michele
    Michele almost 11 years
    @Mark of course it wasn't personal, I know! btw, I forgot something about this method, please see my second edit coming in a short while