Find dates that fail to parse in R Lubridate

17,073

Solution 1

Credit to LawyeR and Stibu from above comments:

  1. I first sorted the raw csv column and did a head() & tail() to find which 3 dates were causing trouble
  2. Alternatively which(is.na(dates$datetime)) was a simple one liner to also find the answer.

Solution 2

Lubridate will throw that error when attempting to parse dates that do not exist because of daylight savings time.

For example:

library(lubridate)
mydate <- strptime('2020-03-08 02:30:00', format = "%Y-%m-%d %H:%M:%S")
ymd_hms(mydate, tz = "America/Denver")

[1] NA
Warning message:
 1 failed to parse. 

My data comes from an unintelligent sensor which does not know about DST, so impossible (but correctly formatted) dates appear in my timeseries.

Solution 3

If the indices of where lubridate fails are useful to know, you can use a for loop with stopifnot() and print each successful parse.

Make some dates, throw an error in there at a random location.

library(lubridate)
set.seed(1)
my_dates<-as.character(sample(seq(as.Date('1900/01/01'), 
as.Date('2000/01/01'), by="day"), 1000))
my_dates[sample(1:length(my_dates), 1)]<-"purpleElephant"

Now use a for loop and print each successful parse with stopifnot().

for(i in 1:length(my_dates)){
   print(i)
   stopifnot(!is.na(ymd(my_dates[i])))
}

Share:
17,073
Korben Dallas
Author by

Korben Dallas

Updated on June 11, 2022

Comments

  • Korben Dallas
    Korben Dallas almost 2 years

    As a R novice I'm pulling my hair out trying to debug cryptic R errors. I have csv that containing 150k lines that I load into a data frame named 'date'. I then use lubridate to convert this character column to datetimes in hopes of finding min/max date.

      dates <- csv[c('datetime')]
      dates$datetime <- ymd_hms(dates$datetime)
    

    Running this code I receive the following error message:

    Warning message:
    3 failed to parse. 
    

    I accept this as the CSV could have some janky dates in there and next run:

    min(dates$datetime) 
    max(dates$datetime)
    

    Both of these return NA, which I assume is from the few broken dates still stored in the data frame. I've searched around for a quick fix, and have even tried to build a foreach loop to identify the problem dates, but no luck. What would be a simple way to identify the 3 broken dates?

    example date format: 2015-06-17 17:10:16 +0000
    
  • Monica Heddneck
    Monica Heddneck over 7 years
    This is great, but doesn't really answer the general question. What if the problem is that the character 'purpleElephant' is in your data? It's not an NA yet is still unparseable. We still need some way to view the warnings that are given by Lubridate.
  • Jon
    Jon about 7 years
    The question was about identifying the three broken dates and this accomplishes that perfectly.
  • dez93_2000
    dez93_2000 over 4 years
    But it only accomplishes that because the 3 dates happened to be NAs. I have a vector of 93 dates/datetimes which contains ~17 NAs and am getting "2 failed to parse". So this solution doesn't solve the generic problem, just the problem in OP's case.