How to convert in both directions between year,month,day and dates in R?

37,634

Solution 1

Because there are so many ways in which a date can be passed in from files, databases etc and for the reason you mention of just being written in different orders or with different separators, representing the inputted date as a character string is a convenient and useful solution. R doesn't hold the actual dates as strings and you don't need to process them as strings to work with them.

Internally R is using the operating system to do these things in a standard way. You don't need to manipulate strings at all - just perhaps convert some things from character to their numerical equivalent. For example, it is quite easy to wrap up both operations (forwards and backwards) in simple functions you can deploy.

toDate <- function(year, month, day) {
    ISOdate(year, month, day)
}

toNumerics <- function(Date) {
    stopifnot(inherits(Date, c("Date", "POSIXt")))
    day <- as.numeric(strftime(Date, format = "%d"))
    month <- as.numeric(strftime(Date, format = "%m"))
    year <- as.numeric(strftime(Date, format = "%Y"))
    list(year = year, month = month, day = day)
}

I forego the a single call to strptime() and subsequent splitting on a separation character because you don't like that kind of manipulation.

> toDate(2004, 12, 21)
[1] "2004-12-21 12:00:00 GMT"
> toNumerics(toDate(2004, 12, 21))
$year
[1] 2004

$month
[1] 12

$day
[1] 21

Internally R's datetime code works well and is well tested and robust if a bit complex in places because of timezone issues etc. I find the idiom used in toNumerics() more intuitive than having a date time as a list and remembering which elements are 0-based. Building on the functionality provided would seem easier than trying to avoid string conversions etc.

Solution 2

I'm a bit late to the party, but one other way to convert from integers to date is the lubridate::make_date function. See the example below from R for Data Science:

library(lubridate)
library(nycflights13)
library(tidyverse)

a <- flights %>%
  mutate(date = make_date(year, month, day))

Solution 3

Found one solution for going from date to year,month,day.

Let's say we have a date object, that we'll create here using ISOdate:

somedate <- ISOdate(2004,12,21)

Then, we can get the numerical components of this as follows:

unclass(as.POSIXlt(somedate))

Gives:

$sec
[1] 0

$min
[1] 0

$hour
[1] 12

$mday
[1] 21

$mon
[1] 11

$year
[1] 104

Then one can get what one wants for example:

unclass(as.POSIXlt(somedate))$mon

Note that $year is [actual year] - 1900, month is 0-based, mday is 1-based (as per the POSIX standard)

Share:
37,634
Hugh Perkins
Author by

Hugh Perkins

Machine learning engineer. Wrote DeepCL, https://github.com/hughperkins/DeepCL, and a bunch of other GPU/OpenCL things.

Updated on July 09, 2022

Comments

  • Hugh Perkins
    Hugh Perkins almost 2 years

    How to convert between year,month,day and dates in R?

    I know one can do this via strings, but I would prefer to avoid converting to strings, partly because maybe there is a performance hit?, and partly because I worry about regionalization issues, where some of the world uses "year-month-day" and some uses "year-day-month".

    It looks like ISODate provides the direction year,month,day -> DateTime , although it does first converts the number to a string, so if there is a way that doesn't go via a string then I prefer.

    I couldn't find anything that goes the other way, from datetimes to numerical values? I would prefer not needing to use strsplit or things like that.

    Edit: just to be clear, what I have is, a data frame which looks like:

    year month day hour somevalue
    2004 1     1   1   1515353
    2004 1     1   2   3513535
    ....
    

    I want to be able to freely convert to this format:

    time(hour units) somevalue
    1             1515353
    2             3513535
    ....
    

    ... and also be able to go back again.

    Edit: to clear up some confusion on what 'time' (hour units) means, ultimately what I did was, and using information from How to find the difference between two dates in hours in R?:

    forwards direction:

    lh$time <- as.numeric( difftime(ISOdate(lh$year,lh$month,lh$day,lh$hour), ISOdate(2004,1,1,0), units="hours"))
    lh$year <- NULL; lh$month <- NULL; lh$day <- NULL; lh$hour <- NULL
    

    backwards direction:

    ... well, I didnt do backwards yet, but I imagine something like:

    • create difftime object out of lh$time (somehow...)
    • add ISOdate(2004,1,1,0) to difftime object
    • use one of the solution below to get the year,month,day, hour back

    I suppose in the future, I could ask the exact problem I'm trying to solve, but I was trying to factorize my specific problem into generic reusable questions, but maybe that was a mistake?

  • Joshua Ulrich
    Joshua Ulrich over 11 years
    Your note doesn't deserve a ":-O". Those are the POSIX standards. If you don't like it, use format instead: format(ISOdate(2004,12,21),"%m"). ISOdate does not return a string, as ?ISOdate says, it is a wrapper to strptime and returns a POSIXct class object.
  • Hugh Perkins
    Hugh Perkins over 11 years
    We shouldn't have to convert to strings at all. Have a look at java's joda time for a really easy to use class. Modding me down for asking about one of R's achilles heels doesn't make me feel any better about the fact that I've spent 90 minutes trying to figure this out so far....
  • Gavin Simpson
    Gavin Simpson over 11 years
    Please don't confuse being downvoted for "modding" - You'll know when a Mod does something to your Answers/Questions as they are identified by a blue diamond. Voting is part and parcel of Stack Overflow and related sites. Get used to it and don't take it too seriously.
  • Hugh Perkins
    Hugh Perkins over 11 years
    That doesnt help the fact that no-one is actually answering my questions on r dates and times, but just saying it's "obvious". If it was obvious, I wouldn't have asked the questions...
  • Joshua Ulrich
    Joshua Ulrich over 11 years
    You don't have to convert to a string. Just convert the output of ISOdate to POSIXlt and use the $mon element: as.POSIXlt(ISOdate(2004,12,21))$mon+1. I have a hard time believing you spent 90 minutes trying to figure this out and didn't get to the part where ISOdate returns a time-based class, not a string. Read ?DateTimeClasses.
  • Gavin Simpson
    Gavin Simpson over 11 years
    This is patently not what you wanted as you explicitly stated that ISOdate() converts to a string internally which wasn't what you wanted. Also, why are you converting this to get the month that you already have!
  • Ben Bolker
    Ben Bolker over 11 years
    I think that might just have been an example. as.POSIXlt(Sys.time())$mon+1 might have been a better example (i.e. not cluttering things up with ISOdate ...)
  • Ben Bolker
    Ben Bolker over 11 years
    PS could we relax the tone here just a little bit on both sides?
  • Hugh Perkins
    Hugh Perkins over 11 years
    @Ben, thanks for pointing out why my example looks strange. I've edited it now to point out that the ISOdate is just a way to get the incoming datetime, as you say.
  • Hugh Perkins
    Hugh Perkins over 11 years
    CAn one of the downvoters propose a better way? Do you think that as.numeric(format(... is better?
  • Hugh Perkins
    Hugh Perkins over 11 years
    @Joshua, are you saying 'unclass' converts to a string? I just found that on this page stat.ethz.ch/R-manual/R-patched/library/base/html/… , but I've no idea what it actually does. The good point is that it tells me the names of the available fields, which I didn't manage to find in any other documentaiton.
  • Gavin Simpson
    Gavin Simpson over 11 years
    @HughPerkins No, unclass() strips the class attribute which affects dispatch on the print() function you are implicitly calling when typing the name of the object at the command line. With the class, the print method for the "POSIXlt" class is called and the internal representation of the time rendered as a string and printed. When unclassed, the default print() method is called which prints the object in its native format which is a list with stated components.
  • Gavin Simpson
    Gavin Simpson over 11 years
    In answer to your comment about as.numeric(), then yes I would find that easier and more intuitive than remembering the POSIX standard etc and which components were 0-based. See my Answer which may not be what you want but it is an alternative. I have also removed my -1 as I see now that I had misunderstood what you were doing and wanted.
  • Joshua Ulrich
    Joshua Ulrich over 11 years
    @HughPerkins: No, I was referring to, "apparently it [ISOdate] first converts the number to a string!" in your question. Ben Bolker and I have both proposed a better way. There's no need to unclass the POISXlt object to access the mon vector. Again, this is all described in ?DateTimeClasses, which ?ISOdate points you to.
  • Hugh Perkins
    Hugh Perkins over 11 years
    @Joshua: ?DateTimeClasses is quite useful. Thanks!
  • Hugh Perkins
    Hugh Perkins over 11 years
    At all. Ok, I see I got downvoted for two reasons, one of which is that my writing was not really clear. One of which, perhaps the greater, was the tone of my writing. I've edited out most of the tone (and someone helped me edit out what was left). It's midnight, it's the first time I've used R, I started at 11am this morning, and I should probably take a break and get some sleep! R is very cool by the way.
  • Hugh Perkins
    Hugh Perkins over 11 years
    Fair enough. I guess I'm just a little nervous that there could be localization issues sometime? I've been burned so many times where the regional settings affect the month-day order. (Edit: Well, and I guess, strings sounds like not the highest-performance way to handle numbers?)
  • Gavin Simpson
    Gavin Simpson over 11 years
    Note that nowhere do I create a string in the form of "XX-XX-YYYY" where there may be ambiguity in which of the "XX" refers to day or month. I use well tested functions to extract the specific parts of the internal representation of the date/time and render it appropriate. When I ask for the day part I always get the day part etc.
  • Gavin Simpson
    Gavin Simpson over 11 years
    The localisation only comes in in two places. i) when taking a character representation of the date/time, but that can always be handled by passing both the string and the format for that string. ii) the POSIXt representations have timezones and they can be a bit confusing. But this is an issue for how R internally handles DateTimes, and not something related to character representations.
  • Joshua Ulrich
    Joshua Ulrich over 11 years
    @HughPerkins: regional settings don't affect day-month order, even if they affected how the objects were printed, they wouldn't affect how they're actually stored. You do have to be careful about timezones though, but that's true everywhere.
  • Gregor Thomas
    Gregor Thomas over 11 years
    @HughPerkins I had similar issues the first time I used dates in R. (See my question.) Rather than unclassing, you can use the attributes function to see the names of the available fields.
  • hadley
    hadley over 11 years
    Or use lubridate which provides hour, year, hour etc functions for you.
  • Gavin Simpson
    Gavin Simpson over 11 years
    @hadley +1 yes, I was going to look into / suggest that now that I am home. Been putting babies to bed...
  • Hugh Perkins
    Hugh Perkins over 11 years
    @shujaa. attributes is very useful. Thanks!
  • Hugh Perkins
    Hugh Perkins over 11 years
    I'm selecting this answer, because it does answer the question, and it does avoid having to deal with posix conventions for year,day,month. I'd quite like to see a solution using lubridate. As for my own usage, in the end, I went with ISOdate for forwards conversion, and then simply didn't throw away the information, copied it to a new table, so I could use a table lookup to do the backwards conversion.