Find dates that fail to parse in R Lubridate
Solution 1
Credit to LawyeR and Stibu from above comments:
- I first sorted the raw csv column and did a head() & tail() to find which 3 dates were causing trouble
- Alternatively
which(is.na(dates$datetime))
was a simple one liner to also find the answer.
Solution 2
Lubridate will throw that error when attempting to parse dates that do not exist because of daylight savings time.
For example:
library(lubridate)
mydate <- strptime('2020-03-08 02:30:00', format = "%Y-%m-%d %H:%M:%S")
ymd_hms(mydate, tz = "America/Denver")
[1] NA
Warning message:
1 failed to parse.
My data comes from an unintelligent sensor which does not know about DST, so impossible (but correctly formatted) dates appear in my timeseries.
Solution 3
If the indices of where lubridate fails are useful to know, you can use a for loop with stopifnot() and print each successful parse.
Make some dates, throw an error in there at a random location.
library(lubridate)
set.seed(1)
my_dates<-as.character(sample(seq(as.Date('1900/01/01'),
as.Date('2000/01/01'), by="day"), 1000))
my_dates[sample(1:length(my_dates), 1)]<-"purpleElephant"
Now use a for loop and print each successful parse with stopifnot().
for(i in 1:length(my_dates)){
print(i)
stopifnot(!is.na(ymd(my_dates[i])))
}
Korben Dallas
Updated on June 11, 2022Comments
-
Korben Dallas almost 2 years
As a R novice I'm pulling my hair out trying to debug cryptic R errors. I have csv that containing 150k lines that I load into a data frame named 'date'. I then use lubridate to convert this character column to datetimes in hopes of finding min/max date.
dates <- csv[c('datetime')] dates$datetime <- ymd_hms(dates$datetime)
Running this code I receive the following error message:
Warning message: 3 failed to parse.
I accept this as the CSV could have some janky dates in there and next run:
min(dates$datetime) max(dates$datetime)
Both of these return NA, which I assume is from the few broken dates still stored in the data frame. I've searched around for a quick fix, and have even tried to build a foreach loop to identify the problem dates, but no luck. What would be a simple way to identify the 3 broken dates?
example date format: 2015-06-17 17:10:16 +0000
-
Monica Heddneck over 7 yearsThis is great, but doesn't really answer the general question. What if the problem is that the character 'purpleElephant' is in your data? It's not an NA yet is still unparseable. We still need some way to view the warnings that are given by Lubridate.
-
Jon about 7 yearsThe question was about identifying the three broken dates and this accomplishes that perfectly.
-
dez93_2000 over 4 yearsBut it only accomplishes that because the 3 dates happened to be NAs. I have a vector of 93 dates/datetimes which contains ~17 NAs and am getting "2 failed to parse". So this solution doesn't solve the generic problem, just the problem in OP's case.