Specify custom Date format for colClasses argument in read.table/read.csv

56,047

Solution 1

You can write your own function that accepts a string and converts it to a Date using the format you want, then use the setAs to set it as an as method. Then you can use your function as part of the colClasses.

Try:

setAs("character","myDate", function(from) as.Date(from, format="%d/%m/%Y") )

tmp <- c("1, 15/08/2008", "2, 23/05/2010")
con <- textConnection(tmp)

tmp2 <- read.csv(con, colClasses=c('numeric','myDate'), header=FALSE)
str(tmp2)

Then modify if needed to work for your data.

Edit ---

You might want to run setClass('myDate') first to avoid the warning (you can ignore the warning, but it can get annoying if you do this a lot and this is a simple call that gets rid of it).

Solution 2

If there is only 1 date format you want to change, you could use the Defaults package to change the default format within as.Date.character

library(Defaults)
setDefaults('as.Date.character', format = '%d/%M/%Y')
dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))
str(dataImport)
## 'data.frame':    10 obs. of  3 variables:
##  $ func_loc     : Factor w/ 5 levels "3076WAG0003",..: 1 2 3 3 3 3 3 4 4 5
##  $ order_type   : Factor w/ 3 levels "PM01","PM02",..: 3 3 1 1 1 1 2 2 3 1
##  $ actual_finish: Date, format: "2008-10-15" "2009-10-19" NA "2008-10-11" ...

I think @Greg Snow's answer is far better, as it does not change the default behaviour of an often used function.

Solution 3

In case you need time also:

setClass('yyyymmdd-hhmmss')
setAs("character","yyyymmdd-hhmmss", function(from) as.POSIXct(from, format="%Y%m%d-%H%M%S"))
d <- read.table(colClasses="yyyymmdd-hhmmss", text="20150711-130153")
str(d)
## 'data.frame':    1 obs. of  1 variable:
## $ V1: POSIXct, format: "2015-07-11 13:01:53"

Solution 4

A long time ago, in the meantime the problem has been solved by Hadley Wickham. So nowadays the solution is reduced to a oneliner:

library(readr)
data <- read_csv("data.csv", 
                  col_types = cols(actual_finish = col_datetime(format = "%d/%m/%Y")))

Maybe we want even to get rid of unnecessary stuff:

data <- as.data.frame(data)
Share:
56,047

Related videos on Youtube

Tommy O'Dell
Author by

Tommy O'Dell

Updated on May 28, 2020

Comments

  • Tommy O'Dell
    Tommy O'Dell almost 4 years

    Question:

    Is there a way to specify the Date format when using the colClasses argument in read.table/read.csv?

    (I realise I can convert after importing, but with many date columns like this, it would be easier to do it in the import step)


    Example:

    I have a .csv with date columns in the format %d/%m/%Y.

    dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))
    

    This gets the conversion wrong. For example, 15/07/2008 becomes 0015-07-20.


    Reproducible code:

    data <- 
    structure(list(func_loc = structure(c(1L, 2L, 3L, 3L, 3L, 3L, 
    3L, 4L, 4L, 5L), .Label = c("3076WAG0003", "3076WAG0004", "3076WAG0007", 
    "3076WAG0009", "3076WAG0010"), class = "factor"), order_type = structure(c(3L, 
    3L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 1L), .Label = c("PM01", "PM02", 
    "PM03"), class = "factor"), actual_finish = structure(c(4L, 6L, 
    1L, 2L, 3L, 7L, 1L, 8L, 1L, 5L), .Label = c("", "11/03/2008", 
    "14/08/2008", "15/07/2008", "17/03/2008", "19/01/2009", "22/09/2008", 
    "6/09/2007"), class = "factor")), .Names = c("func_loc", "order_type", 
    "actual_finish"), row.names = c(NA, 10L), class = "data.frame")
    
    
    write.csv(data,"data.csv", row.names = F)                                                        
    
    dataImport <- read.csv("data.csv")
    str(dataImport)
    dataImport
    
    dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))
    str(dataImport)
    dataImport
    

    And here's what the output looks like:

    code output

    • joran
      joran over 11 years
      A hackish way to do this would be to create your own version of read.table and add a format argument that is passed on to as.Date. I wouldn't be surprised if there's a better way that I'm not thinking of, though.
  • Tommy O'Dell
    Tommy O'Dell over 11 years
    Thanks Greg - I dind't know about the as or setAs functions. Works great.
  • user295691
    user295691 over 11 years
    Wow -- setAs is a lifesaver! How have I never seen this function before?
  • Danny D'Amours
    Danny D'Amours over 11 years
    Note that you might get a 'no definition for class "myDate"' warning as detailed in this question.
  • Josh O'Brien
    Josh O'Brien over 10 years
    What is setMethod('myDate') supposed to do? Running it just gives me an error...
  • Greg Snow
    Greg Snow over 10 years
    @JoshO'Brien, sorry that should have been setClass (fixed now). What it does is prevent setAs from issuing a warning about 'myDate' not existing as a class. The warning is harmless and everything still works, but setting the class means that you don't even see the warning.
  • MySchizoBuddy
    MySchizoBuddy about 10 years
    Is it better to just save it as character and then do the conversion of the column to date, or do it as mentioned above?
  • Greg Snow
    Greg Snow about 10 years
    @MySchizoBuddy, If you only have one date column and you are doing this one time then it probably does not matter which way you do it. But if you have several columns in your dataset that are dates, then I think this approach would probably be simpler than changing each of the columns after reading.
  • jks612
    jks612 over 5 years
    For those like me who came this thread hoping to implement it within data.table::fread, fread doesn't implement date types in it's conversions so this will still not work at time of this writing in fread.