What are the "standard unambiguous date" formats for string-to-date conversion in R?
Solution 1
This is documented behavior. From ?as.Date
:
format: A character string. If not specified, it will try '"%Y-%m-%d"' then '"%Y/%m/%d"' on the first non-'NA' element, and give an error if neither works.
as.Date("01 Jan 2000")
yields an error because the format isn't one of the two listed above. as.Date("01/01/2000")
yields an incorrect answer because the date isn't in one of the two formats listed above.
I take "standard unambiguous" to mean "ISO-8601" (even though as.Date
isn't that strict, as "%m/%d/%Y" isn't ISO-8601).
If you receive this error, the solution is to specify the format your date (or datetimes) are in, using the formats described in the Details section in ?strptime
.
Make sure that the order of the conversion specification as well as any separators correspond exactly with the format of your input string. Also, be sure to use particular care if your data contain day/month names and/or abbreviations, as the conversion will depend on your locale (see the examples in ?strptime
and read ?LC_TIME
; see also strptime
, as.POSIXct
and as.Date
return unexpected NA
).
Solution 2
In other words, is there a better solution than needing to specify the format?
Yes, there is now (ie in late 2016), thanks to anytime::anydate
from the anytime package.
See the following for some examples from above:
R> anydate(c("01 Jan 2000", "01/01/2000", "2015/10/10"))
[1] "2000-01-01" "2000-01-01" "2015-10-10"
R>
As you said, these are in fact unambiguous and should just work. And via anydate()
they do. Without a format.
Solution 3
As a complement to @JoshuaUlrich answer, here is the definition of function as.Date.character
:
as.Date.character
function (x, format = "", ...)
{
charToDate <- function(x) {
xx <- x[1L]
if (is.na(xx)) {
j <- 1L
while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
if (is.na(xx))
f <- "%Y-%m-%d"
}
if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d",
tz = "GMT")) || !is.na(strptime(xx, f <- "%Y/%m/%d",
tz = "GMT")))
return(strptime(x, f))
stop("character string is not in a standard unambiguous format")
}
res <- if (missing(format))
charToDate(x)
else strptime(x, format, tz = "GMT")
as.Date(res)
}
<bytecode: 0x265b0ec>
<environment: namespace:base>
So basically if both strptime(x, format="%Y-%m-%d")
and strptime(x, format="%Y/%m/%d")
throws an NA
it is considered ambiguous and if not unambiguous.
Solution 4
Converting the date without specifying the current format can bring this error to you easily.
Here is an example:
sdate <- "2015.10.10"
Convert without specifying the Format:
date <- as.Date(sdate4) # ==> This will generate the same error"""Error in charToDate(x): character string is not in a standard unambiguous format""".
Convert with specified Format:
date <- as.Date(sdate4, format = "%Y.%m.%d") # ==> Error Free Date Conversion.
Solution 5
This works perfectly for me, not matter how the date was coded previously.
library(lubridate)
data$created_date1 <- mdy_hm(data$created_at)
data$created_date1 <- as.Date(data$created_date1)
Related videos on Youtube
Matt Dowle
Project homepage | Datacamp data.table online course
Updated on August 05, 2021Comments
-
Matt Dowle almost 3 years
Please consider the following
$ R --vanilla > as.Date("01 Jan 2000") Error in charToDate(x) : character string is not in a standard unambiguous format
But that date clearly is in a standard unambiguous format. Why the error message?
Worse, an ambiguous date is apparently accepted without warning or error and then read incorrectly!
> as.Date("01/01/2000") [1] "0001-01-20"
I've searched and found 28 other questions in the [R] tag containing this error message. All with solutions and workarounds involving specifying the format, iiuc. This question is different in that I'm asking where are the standard unambiguous formats defined anyway, and can they be changed? Does everyone get these messages or is it just me? Perhaps it is locale related?
In other words, is there a better solution than needing to specify the format?
29 questions containing "[R] standard unambiguous format"
> sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base
-
plannapus over 11 yearsjudging by the function definition of
as.Date.character
the input is only tested for these two formats:"%Y-%m-%d"
and"%Y/%m/%d"
. If it can match one of them it seems to be deemed "unambiguous". -
Matt Dowle over 11 years@CarlWitthoft "Did I even read" seems to imply the answer is blindingly obvious in
?as.Date
. Where does it help with this? -
Matt Dowle over 11 years@plannapus Thanks, that seems to be the answer. Would you mind adding it then I can accept.
-
IRTFM over 11 yearsArguably "Jan 24 1949" and "24 Jan 1949" would be unambiguous, but they are certainly Anglo-centric. Yet there are also values for 'month.abb' that are Anglo-centric as well, so a case could be made for those values to be matched in cases where :
strptime(xx, f <- "%d $B %Y", tz = "GMT")
orstrptime(xx, f <- "%B $d %Y", tz = "GMT")
returned values. (I'm not implying thatmonth.abb
is used for the matching to %B since the docs say the matching is locale specific.) -
Matt Dowle over 11 years@CarlWitthoft Some of us trip up every now and again. Thanks for the kick while I'm down. In this question I got quite a few things right: I included sessionInfo(), I searched, told you what I searched and included a link, I kept it as consise as possible. I missed one line in ?as.Date and you give me the TFM treatment. We can't all be as perfect as you all the time.
-
Carl Witthoft over 11 years@MatthewDowle sorry if I came down hard. I think the flamosity started when you appeared to confuse "unambiguous to a reasonably well-educated human" with "unambiguous to a poor helpless piece of code" . :-(
-
-
Matt Dowle over 11 years@BenBolker How about
"character string is not either %Y-%m-%d or %Y/%m/%d"
? -
jthetzel over 11 yearsThe behavior is certainly documented in
?as.Date
(+1). However, the error message "standard unambiguous format" is ironically ambiguous, to which the 23 previous questions attest. A more direct error message like, "format not recognized, see documentation" might improve user experience. Also, I don't believe "01/01/2000" is ISO-8601 ("2000-01-01" is ISO-8601), which adds to the ambiguity. -
Joshua Ulrich almost 9 years@jthetzel: you are right, "01/01/2000" is not ISO-8601. I meant that I personally think of ISO-8601 to be the standard, unambiguous format. And I agree that
as.Date
not complaining about "01/01/2000" is inconsistent with the error message. -
Dirk Eddelbuettel over 7 yearsOnly came here because we had another question of something trying to parse dates with an incomplete format. For complete ones, we're now have something. I am quite pleased with this -- it was a nagging question. And needless to say,
anytime()
is equally useful forPOSIXct
. -
lawyeR over 6 yearsJust used the anytime package and it worked wonderfully, except quite a few NAs. After I ran trimws() on the date vector, everything was perfect.
-
Dirk Eddelbuettel over 6 yearsI use it a metric ton too!
-
owlstone almost 4 yearsLooks so simple! I used anydate() on a column with string values of mm-dd (no yy). All <chr> values in the column were successfully converted to <date>. Unfortunately, it set the year to '1400' instead of '2020'. ¯_(ツ)_/¯
-
Dirk Eddelbuettel almost 4 yearsWell, not quite. As I answered in a few other questions on this site,
mm-dd
is not a date (neither is mm-yy or mm-yyyy). You cannot parse what it is not there. -
Ben Bolker almost 3 yearsYou might even want to specify
NA_character_
(the defaultNA
is of logical type; in practice this hardly matters)