How to get pandas.read_csv() to infer datetime and timedelta types from CSV file columns?

14,650

One thing you can do is define your date parser using strptime, this will handle your date format, this isn't automatic though:

In [59]:

import pandas as pd
import datetime as dt

def parse_dates(x):
    return dt.datetime.strptime(x, '%Y%m%d %H:%M:%S')
# dict for word lookup, conversion
word_to_int={'zero':0,
     'one':1,
     'two':2,
     'three':3,
     'four':4,
     'five':5,
     'six':6,
     'seven':7,
     'eight':8,
     'nine':9}


def str_to_time_delta(x):
    num = 0
    if 'hour' in x.lower():
        num = x[0:x.find(' ')].lower()
    return dt.timedelta( hours = word_to_int[num])
df = pd.read_csv(r'c:\temp1.txt', parse_dates=[0],date_parser=parse_dates)
df.dtypes
Out[59]:
datetime     datetime64[ns]
timedelta            object
integer               int64
number              float64
boolean                bool
string               object
dtype: object
In [60]:

Then to convert to timedeltas use the dict and function to parse and convert to timedeltas

df['timedelta'] = df['timedelta'].map(str_to_time_delta)

In [61]:

df.dtypes
Out[61]:
datetime      datetime64[ns]
timedelta    timedelta64[ns]
integer                int64
number               float64
boolean                 bool
string                object
dtype: object
In [62]:

df
Out[62]:
             datetime  timedelta  integer  number boolean  string
0 2011-12-30 00:00:00   01:00:00       10     1.6    True  Foobar

[1 rows x 6 columns]

To answer your principal question I don't know of a way to automatically do this.

EDIT

Instead of my convoluted mapping function you can do just this:

df['timedelta'] = pd.to_timedelta(df['timedelta'])

Further edit

As noted by @Jeff you can do this instead of using strptime when reading the csv (in version 0.13.1 and above though):

df = pd.read_csv(r'c:\temp1.txt', parse_dates=[0], infer_datetime_format=True)
Share:
14,650
Sean Hammond
Author by

Sean Hammond

Updated on July 26, 2022

Comments

  • Sean Hammond
    Sean Hammond almost 2 years

    pandas.read_csv() infers the types of columns, but I can't get it to infer any datetime or timedelta type (e.g. datetime64, timedelta64) for columns whose values seem like obvious datetimes and time deltas.

    Here's an example CSV file:

    datetime,timedelta,integer,number,boolean,string
    20111230 00:00:00,one hour,10,1.6,True,Foobar
    

    And some code to read it with pandas:

    dataframe = pandas.read_csv(path)
    

    The types of the columns on that dataframe come out as object, object, int, float, bool, object. They're all as I would expect except the first two columns, which I want to be datetime and timedelta.

    Is it possible to get pandas to automatically detect datetime and timedelta columns?

    (I don't want to have to tell pandas which columns are datetimes and timedeltas or tell it the formats, I want it to try and detect them automatically like it does for into, float and bool columns.)

  • Jeff
    Jeff about 10 years
    to_timedelta replaces the need to actually parse the timedeltas (avaiable in 0.13.0)
  • Jeff
    Jeff about 10 years
    don't need the date_parser function here parse_dates=[0] will work, also could turn on infer_datetime_format to auto pick up the date, see here: pandas.pydata.org/pandas-docs/stable/whatsnew.html#enhanceme‌​nts (still have to specify the column though)
  • EdChum
    EdChum about 10 years
    @Jeff that didn't work with 0.13.1 so it seems I have to use my custom parser
  • Jeff
    Jeff about 10 years
    you have to specify parse_dates[0] ALWAYS
  • EdChum
    EdChum about 10 years
    @Jeff yes that works but this contracdicts the online docs: pandas.pydata.org/pandas-docs/stable/whatsnew.html#enhanceme‌​nts which states parse_dates should be set to True, thanks for the clarification but how to correct the docs?
  • Jeff
    Jeff about 10 years
    sorry, that just means try to parse_dates on every columns (actually that partially answers the question...forgot about that). That is just an example.
  • Hubbitus
    Hubbitus about 8 years
    That magic will parse automatically dates, but NOT timedelta's! Spent many time understand. Bug still open: github.com/pydata/pandas/issues/8185 (see also explanation in github.com/pydata/pandas/issues/4378#issuecomment-24334259)