How to get pandas.read_csv() to infer datetime and timedelta types from CSV file columns?
One thing you can do is define your date parser using strptime
, this will handle your date format, this isn't automatic though:
In [59]:
import pandas as pd
import datetime as dt
def parse_dates(x):
return dt.datetime.strptime(x, '%Y%m%d %H:%M:%S')
# dict for word lookup, conversion
word_to_int={'zero':0,
'one':1,
'two':2,
'three':3,
'four':4,
'five':5,
'six':6,
'seven':7,
'eight':8,
'nine':9}
def str_to_time_delta(x):
num = 0
if 'hour' in x.lower():
num = x[0:x.find(' ')].lower()
return dt.timedelta( hours = word_to_int[num])
df = pd.read_csv(r'c:\temp1.txt', parse_dates=[0],date_parser=parse_dates)
df.dtypes
Out[59]:
datetime datetime64[ns]
timedelta object
integer int64
number float64
boolean bool
string object
dtype: object
In [60]:
Then to convert to timedeltas use the dict and function to parse and convert to timedeltas
df['timedelta'] = df['timedelta'].map(str_to_time_delta)
In [61]:
df.dtypes
Out[61]:
datetime datetime64[ns]
timedelta timedelta64[ns]
integer int64
number float64
boolean bool
string object
dtype: object
In [62]:
df
Out[62]:
datetime timedelta integer number boolean string
0 2011-12-30 00:00:00 01:00:00 10 1.6 True Foobar
[1 rows x 6 columns]
To answer your principal question I don't know of a way to automatically do this.
EDIT
Instead of my convoluted mapping function you can do just this:
df['timedelta'] = pd.to_timedelta(df['timedelta'])
Further edit
As noted by @Jeff you can do this instead of using strptime
when reading the csv (in version 0.13.1 and above though):
df = pd.read_csv(r'c:\temp1.txt', parse_dates=[0], infer_datetime_format=True)
Sean Hammond
Updated on July 26, 2022Comments
-
Sean Hammond almost 2 years
pandas.read_csv() infers the types of columns, but I can't get it to infer any datetime or timedelta type (e.g.
datetime64
,timedelta64
) for columns whose values seem like obvious datetimes and time deltas.Here's an example CSV file:
datetime,timedelta,integer,number,boolean,string 20111230 00:00:00,one hour,10,1.6,True,Foobar
And some code to read it with pandas:
dataframe = pandas.read_csv(path)
The types of the columns on that dataframe come out as object, object, int, float, bool, object. They're all as I would expect except the first two columns, which I want to be datetime and timedelta.
Is it possible to get pandas to automatically detect datetime and timedelta columns?
(I don't want to have to tell pandas which columns are datetimes and timedeltas or tell it the formats, I want it to try and detect them automatically like it does for into, float and bool columns.)
-
Jeff about 10 years
to_timedelta
replaces the need to actually parse the timedeltas (avaiable in 0.13.0) -
Jeff about 10 yearsdon't need the
date_parser
function hereparse_dates=[0]
will work, also could turn oninfer_datetime_format
to auto pick up the date, see here: pandas.pydata.org/pandas-docs/stable/whatsnew.html#enhancements (still have to specify the column though) -
EdChum about 10 years@Jeff that didn't work with 0.13.1 so it seems I have to use my custom parser
-
Jeff about 10 yearsyou have to specify
parse_dates[0]
ALWAYS -
EdChum about 10 years@Jeff yes that works but this contracdicts the online docs: pandas.pydata.org/pandas-docs/stable/whatsnew.html#enhancements which states
parse_dates
should be set toTrue
, thanks for the clarification but how to correct the docs? -
Jeff about 10 yearssorry, that just means try to parse_dates on every columns (actually that partially answers the question...forgot about that). That is just an example.
-
Hubbitus about 8 yearsThat magic will parse automatically dates, but NOT timedelta's! Spent many time understand. Bug still open: github.com/pydata/pandas/issues/8185 (see also explanation in github.com/pydata/pandas/issues/4378#issuecomment-24334259)