Reading csv with date and time

36,178

Solution 1

Although this seems to be an old post, but I want to share my experience since I went through a similar very frustrating process trying to load time series csv data into R. The problem above is that excel changes the format of the date and time to the following %m/%d/%Y %H:%M, basically it drops the seconds. If you read a file with this format and you have a second resolution data you get multiple date time combinations that are similar. so you cannot simply use the format that ignores seconds because it gives the following error message . "character string is not in a standard unambiguous format"

The solution is to go back to excel and change the format of the date time column to be %m/%d/%Y %H:%M:%S. You can do that by choosing the closest date time default formats to the desired format (in this case it is %m/%d/%Y %H:%M and then manually add :ss at the end. Save the file as a csv file and then read it using the following command:

Data<-read.zoo("file.csv", tz="", header=TRUE,format='%m/%d/%Y %H:%M:%S')

This worked for me and I read a file that has about 900K rows.

Solution 2

It looks like the error is due to R not recognising what format your date column is in (it can't work out -- date/month/year? month/date/year? etc).

You can tell R what format it is in using the format argument to read.zoo (see ?strptime for the specifiers you can use).

For example, if it was date/month/year hour(24-hour clock):minutes, you could do:

EURUSD <- as.xts(read.zoo(file_name,
                          sep=',', 
                          tz='',   
                          header=T,
                          format='%d/%m/%Y %H:%M:%S')) # see the 'format' argument?

(Note - in your question the snippet of csv data you showed isn't comma-delimited).

Share:
36,178

Related videos on Youtube

user395882
Author by

user395882

Updated on July 09, 2022

Comments

  • user395882
    user395882 almost 2 years

    I am working in R and reading csv which has date and time in its first column. I want to import this csv file in R first and then convert it to zoo obect.

    I am using the code in R

    EURUSD <- as.xts(read.zoo("myfile.csv",sep=",",tz="",header=T))
    

    My csv file contain data in the format:

    Date,Open,Low,High,Close
    2006-01-02 10:01:00,2822.9,2825.45,2822.1,2824.9
    2006-01-02 10:02:00,2825,2825.9,2824,2824.95
    2006-01-02 10:03:00,2824.55,2826.45,2824,2826.45
    2006-01-02 10:04:00,2826.45,2826.45,2824.9,2825.5
    2006-01-02 10:05:00,2825.15,2825.5,2824,2824.85
    2006-01-02 10:06:00,2824.7,2825.5,2823.7,2823.8
    2006-01-02 10:07:00,2823.95,2824.45,2823.55,2824
    2006-01-02 10:08:00,2824,2824.85,2823.5,2824.85
    2006-01-02 10:09:00,2824.25,2825.45,2824,2825.45
    2006-01-02 10:10:00,2825.2,2827,2825,2827
    

    When I run the above command to import the data in to R I get the folowwwing error :

    Error in as.POSIXlt.character(x, tz, ...) : 
      character string is not in a standard unambiguous format
    

    I tried to find all the ways to sort out the issue. I read so many blogs over net but none of the method works for me.

    I hope someone would help me.

    • Brandon Bertelsen
      Brandon Bertelsen about 12 years
      Sounds like there's something in your date column that's buggering it up. Probably a stray blank, "-" , "," , "n.a." or something like that.
    • mathematical.coffee
      mathematical.coffee about 12 years
      Are you sure your sep is comma? your example csv file format looks like it's tab instead?
    • user395882
      user395882 about 12 years
      It is comma seperated. I open my csv file in notepad and it clearly shows comma seperated
    • mathematical.coffee
      mathematical.coffee about 12 years
      That's fine then - it's just in your question there are no commas in the example data you provided, and I wanted to eliminate that as a problem.
    • G. Grothendieck
      G. Grothendieck about 12 years
      Note that your code does work on the revised sample data in the question so there is something that you are not showing that is causing the problem.
    • user395882
      user395882 about 12 years
      This is the dataset direcly from csv file opened in notepad. If I open csv file it looks like excel sheet. Earlier I copy pasted the data from file which looks like excel sheet. Even I dont know, what's going on. If I open in notepad: I get Date,Open,Low,High,Close 2006-01-02 10:01:00,2822.9,2825.45,2822.1,2824.9. If i open csv file directly it looks like excel sheet and the data there looks like: Date Open Low High Close 1/2/2006 10:01 2822.9 2825.45 2822.1 2824.9 1/2/2006 10:02 2825 2825.9 2824 2824.95
  • user395882
    user395882 about 12 years
    I tried using read.zoo("C:/Users/ParamJeet/Desktop/test/eurusd.csv",sep=",‌​",tz="",format="%Y-%‌​m-%d %H:%M" ,header=T) but it gives me error : Error in read.zoo("C:/Users/ParamJeet/Desktop/test/eurusd.csv", sep = ",", : index has 13428 bad entries at data rows: 323559 323560 323561 323562 323563
  • mathematical.coffee
    mathematical.coffee about 12 years
    Then have a look at those rows in you csv - my guess is they have a different format.
  • user395882
    user395882 about 12 years
    if i open my notepad it looks like Date,Open,Low,High,Close 2006-01-02 10:01:00,2822.9,2825.45,2822.1,2824.9 if I open in excel data looks like Date Open Low High Close 1/2/2006 10:01 2822.9 2825.45 2822.1 2824.9 1/2/2006 10:02 2825 2825.9 2824 2824.95 This think this difference causing the problem.
  • mathematical.coffee
    mathematical.coffee about 12 years
    I see that in your updated question the Date columns has seconds too - I've updated my answer to include that in the format. You will need to look at the bad rows that R tells you, ie rows 323559 323560 323561 323562 323563 and see if they look different to the rest. zoo objects cannot have duplicate indices, so you can't have duplicate times. Check whether the ones R says are "bad" are duplicates or in the wrong format with respect to the format you feed into read.zoo.
  • user395882
    user395882 about 12 years
    I tested another small csv file it worked, provided the file shows same date time format in notepad and excel. Now the problem lies in my file which shows dates in excel like Date 1/2/2006 10:01 1/2/2006 10:02 and in notepad like Date, 2006-01-02 10:01:00, 2006-01-02 10:02:00,. This difference in format causing the error. Can you tell why is this difference in date-time format appear in notepad and excel
  • mathematical.coffee
    mathematical.coffee about 12 years
    Excel applies all sorts of dumb formatting to files because it tries to automagically guess what your columns are. Always use something like notepad when you want to look at data that is to be read into R -- R sees what notepad sees.
  • user395882
    user395882 about 12 years
    I got where is the problem exactly. Actually it has NA at 323559 onwards. I have around 4 years of data from 1/1/2006 10.01 to 12/21/2010 15:30. Everyday data starts from 10:01 everyday untill 12/31/2009 but from 1/1/2010, data starts from 9:01 instead of 10:01. So 1/1/2010 onwards everyday data has extra entried between time 9:01 to 10:00. raw <- read.delim2("C:/Users/ParamJeet/Desktop/test/NIFTY1.csv",hea‌​der=TRUE,sep=",") #convert date x <- paste(raw$X.DATE.,raw$X.TIME.) stripday<- strptime(x,format="%Y%m%d %H%M") strptime unable to convert the data between 9:04 to 10:00. ANy suggestion