Why use infer_datetime_format when importing csv file?
The docs for pandas.read_csv
suggest why:
infer_datetime_format : boolean, default False
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
Essentially, Pandas deduces the format of your datetime
from the first element(s) and then assumes all other elements in the series will use the same format. This means Pandas does not need to check multiple formats when attempting to convert a string to datetime
.
Remember, CSV files can only hold textual data, so a conversion to datetime
(essentially a numeric type) will always be required.
Here's a demonstration:
from dateutil import parser
from datetime import datetime
L = ['2018-01-05', '2018-12-20', '2018-03-30', '2018-04-15']*5000
%timeit [parser.parse(i) for i in L] # 1.57 s
%timeit [datetime.strptime(i, '%Y-%m-%d') for i in L] # 338 ms
Related videos on Youtube
rul30
What we usually consider as impossible are simply engineering problems... there's no law of physics preventing them. Michio Kaku
Updated on July 14, 2022Comments
-
rul30 almost 2 years
Where is the process difference between:
df=pd.read_csv(filename, parse_dates=[0], infer_datetime_format=True)
and
df=pd.read_csv(filename, parse_dates=[0])
Why is the first import to be faster? Since parse_dates already specifies where to look for a date.
-
rul30 almost 6 yearsdoes that mean, that without this command pandas would try to check for ever "row" if and if yes, than which datetime format is used in each row?
-
jpp almost 6 yearsNot sure what you mean. But without inferring, it will try multiple formats for each row until one works. It's not efficient.
-
rul30 almost 6 yearsThanks for helping out here, I obviously need your help ;-) so does it mean that if a different date could show up this option should not be used? Maybe my question should have been, what is the downside of inferring?
-
jpp almost 6 yearsCorrect. If you have more than one format don't infer.