What are the "standard unambiguous date" formats for string-to-date conversion in R?

233,014

Solution 1

This is documented behavior. From ?as.Date:

format: A character string. If not specified, it will try '"%Y-%m-%d"' then '"%Y/%m/%d"' on the first non-'NA' element, and give an error if neither works.

as.Date("01 Jan 2000") yields an error because the format isn't one of the two listed above. as.Date("01/01/2000") yields an incorrect answer because the date isn't in one of the two formats listed above.

I take "standard unambiguous" to mean "ISO-8601" (even though as.Date isn't that strict, as "%m/%d/%Y" isn't ISO-8601).

If you receive this error, the solution is to specify the format your date (or datetimes) are in, using the formats described in the Details section in ?strptime.

Make sure that the order of the conversion specification as well as any separators correspond exactly with the format of your input string. Also, be sure to use particular care if your data contain day/month names and/or abbreviations, as the conversion will depend on your locale (see the examples in ?strptime and read ?LC_TIME; see also strptime, as.POSIXct and as.Date return unexpected NA).

Solution 2

In other words, is there a better solution than needing to specify the format?

Yes, there is now (ie in late 2016), thanks to anytime::anydate from the anytime package.

See the following for some examples from above:

R> anydate(c("01 Jan 2000", "01/01/2000", "2015/10/10"))
[1] "2000-01-01" "2000-01-01" "2015-10-10"
R> 

As you said, these are in fact unambiguous and should just work. And via anydate() they do. Without a format.

Solution 3

As a complement to @JoshuaUlrich answer, here is the definition of function as.Date.character:

as.Date.character
function (x, format = "", ...) 
{
    charToDate <- function(x) {
        xx <- x[1L]
        if (is.na(xx)) {
            j <- 1L
            while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
            if (is.na(xx)) 
                f <- "%Y-%m-%d"
        }
        if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d", 
            tz = "GMT")) || !is.na(strptime(xx, f <- "%Y/%m/%d", 
            tz = "GMT"))) 
            return(strptime(x, f))
        stop("character string is not in a standard unambiguous format")
    }
    res <- if (missing(format)) 
        charToDate(x)
    else strptime(x, format, tz = "GMT")
    as.Date(res)
}
<bytecode: 0x265b0ec>
<environment: namespace:base>

So basically if both strptime(x, format="%Y-%m-%d") and strptime(x, format="%Y/%m/%d") throws an NA it is considered ambiguous and if not unambiguous.

Solution 4

Converting the date without specifying the current format can bring this error to you easily.

Here is an example:

sdate <- "2015.10.10"

Convert without specifying the Format:

date <- as.Date(sdate4) # ==> This will generate the same error"""Error in charToDate(x): character string is not in a standard unambiguous format""".

Convert with specified Format:

date <- as.Date(sdate4, format = "%Y.%m.%d") # ==> Error Free Date Conversion.

Solution 5

This works perfectly for me, not matter how the date was coded previously.

library(lubridate)
data$created_date1 <- mdy_hm(data$created_at)
data$created_date1 <- as.Date(data$created_date1)
Share:
233,014

Related videos on Youtube

Matt Dowle
Author by

Matt Dowle

Project homepage | Datacamp data.table online course

Updated on August 05, 2021

Comments

  • Matt Dowle
    Matt Dowle almost 3 years

    Please consider the following

    $ R --vanilla
    
    > as.Date("01 Jan 2000")
    Error in charToDate(x) :
        character string is not in a standard unambiguous format
    

    But that date clearly is in a standard unambiguous format. Why the error message?

    Worse, an ambiguous date is apparently accepted without warning or error and then read incorrectly!

    > as.Date("01/01/2000")
    [1] "0001-01-20"
    

    I've searched and found 28 other questions in the [R] tag containing this error message. All with solutions and workarounds involving specifying the format, iiuc. This question is different in that I'm asking where are the standard unambiguous formats defined anyway, and can they be changed? Does everyone get these messages or is it just me? Perhaps it is locale related?

    In other words, is there a better solution than needing to specify the format?

    29 questions containing "[R] standard unambiguous format"

    > sessionInfo()
    R version 2.15.2 (2012-10-26)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    
    locale:
    [1] LC_COLLATE=English_United Kingdom.1252
    [2] LC_CTYPE=English_United Kingdom.1252
    [3] LC_MONETARY=English_United Kingdom.1252
    [4] LC_NUMERIC=C
    [5] LC_TIME=English_United Kingdom.1252
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base
    
    • plannapus
      plannapus over 11 years
      judging by the function definition of as.Date.character the input is only tested for these two formats: "%Y-%m-%d" and "%Y/%m/%d". If it can match one of them it seems to be deemed "unambiguous".
    • Matt Dowle
      Matt Dowle over 11 years
      @CarlWitthoft "Did I even read" seems to imply the answer is blindingly obvious in ?as.Date. Where does it help with this?
    • Matt Dowle
      Matt Dowle over 11 years
      @plannapus Thanks, that seems to be the answer. Would you mind adding it then I can accept.
    • IRTFM
      IRTFM over 11 years
      Arguably "Jan 24 1949" and "24 Jan 1949" would be unambiguous, but they are certainly Anglo-centric. Yet there are also values for 'month.abb' that are Anglo-centric as well, so a case could be made for those values to be matched in cases where : strptime(xx, f <- "%d $B %Y", tz = "GMT") or strptime(xx, f <- "%B $d %Y", tz = "GMT") returned values. (I'm not implying that month.abb is used for the matching to %B since the docs say the matching is locale specific.)
    • Matt Dowle
      Matt Dowle over 11 years
      @CarlWitthoft Some of us trip up every now and again. Thanks for the kick while I'm down. In this question I got quite a few things right: I included sessionInfo(), I searched, told you what I searched and included a link, I kept it as consise as possible. I missed one line in ?as.Date and you give me the TFM treatment. We can't all be as perfect as you all the time.
    • Carl Witthoft
      Carl Witthoft over 11 years
      @MatthewDowle sorry if I came down hard. I think the flamosity started when you appeared to confuse "unambiguous to a reasonably well-educated human" with "unambiguous to a poor helpless piece of code" . :-(
  • Matt Dowle
    Matt Dowle over 11 years
    @BenBolker How about "character string is not either %Y-%m-%d or %Y/%m/%d"?
  • jthetzel
    jthetzel over 11 years
    The behavior is certainly documented in ?as.Date (+1). However, the error message "standard unambiguous format" is ironically ambiguous, to which the 23 previous questions attest. A more direct error message like, "format not recognized, see documentation" might improve user experience. Also, I don't believe "01/01/2000" is ISO-8601 ("2000-01-01" is ISO-8601), which adds to the ambiguity.
  • Joshua Ulrich
    Joshua Ulrich almost 9 years
    @jthetzel: you are right, "01/01/2000" is not ISO-8601. I meant that I personally think of ISO-8601 to be the standard, unambiguous format. And I agree that as.Date not complaining about "01/01/2000" is inconsistent with the error message.
  • Dirk Eddelbuettel
    Dirk Eddelbuettel over 7 years
    Only came here because we had another question of something trying to parse dates with an incomplete format. For complete ones, we're now have something. I am quite pleased with this -- it was a nagging question. And needless to say, anytime() is equally useful for POSIXct.
  • lawyeR
    lawyeR over 6 years
    Just used the anytime package and it worked wonderfully, except quite a few NAs. After I ran trimws() on the date vector, everything was perfect.
  • Dirk Eddelbuettel
    Dirk Eddelbuettel over 6 years
    I use it a metric ton too!
  • owlstone
    owlstone almost 4 years
    Looks so simple! I used anydate() on a column with string values of mm-dd (no yy). All <chr> values in the column were successfully converted to <date>. Unfortunately, it set the year to '1400' instead of '2020'. ¯_(ツ)_/¯
  • Dirk Eddelbuettel
    Dirk Eddelbuettel almost 4 years
    Well, not quite. As I answered in a few other questions on this site, mm-dd is not a date (neither is mm-yy or mm-yyyy). You cannot parse what it is not there.
  • Ben Bolker
    Ben Bolker almost 3 years
    You might even want to specify NA_character_ (the default NA is of logical type; in practice this hardly matters)