to_datetime Value Error: at least that [year, month, day] must be specified Pandas
Solution 1
You can stack
/ pd.to_datetime
/ unstack
pd.to_datetime(dte.stack()).unstack()
explanation
pd.to_datetime
works on a string, list, or pd.Series
. dte
is a pd.DataFrame
and is why you are having issues. dte.stack()
produces a a pd.Series
where all rows are stacked on top of each other. However, in this stacked form, because it is a pd.Series
, I can get a vectorized pd.to_datetime
to work on it. the subsequent unstack
simply reverses the initial stack
to get the original form of dte
Solution 2
For me works apply
function to_datetime
:
print (dtd)
1 2 3 4 5 6
0
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
dtd = dtd.apply(pd.to_datetime)
print (dtd)
1 2 3 4 5 6
0
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
Solution 3
It works for me:
dtd.apply(lambda x: pd.to_datetime(x,errors = 'coerce', format = '%Y-%m-%d'))
This way you can use function attributes like above (errors and format). See more https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html
Related videos on Youtube
Jed
Updated on April 03, 2020Comments
-
Jed about 4 years
I am reading from two different CSVs each having date values in their columns. After read_csv I want to convert the data to datetime with the to_datetime method. The formats of the dates in each CSV are slightly different, and although the differences are noted and specified in the to_datetime format argument, the one converts fine, while the other returns the following value error.
ValueError: to assemble mappings requires at least that [year, month, day] be sp ecified: [day,month,year] is missing
first dte.head()
0 10/14/2016 10/17/2016 10/19/2016 8/9/2016 10/17/2016 7/20/2016 1 7/15/2016 7/18/2016 7/20/2016 6/7/2016 7/18/2016 4/19/2016 2 4/15/2016 4/14/2016 4/18/2016 3/15/2016 4/18/2016 1/14/2016 3 1/15/2016 1/19/2016 1/19/2016 10/19/2015 1/19/2016 10/13/2015 4 10/15/2015 10/14/2015 10/19/2015 7/23/2015 10/14/2015 7/15/2015
this dataframe converts fine using the following code:
dte = pd.to_datetime(dte, infer_datetime_format=True)
or
dte = pd.to_datetime(dte[x], format='%m/%d/%Y')
the second dtd.head()
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30 1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06 4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
this csv doesn't convert using either:
dtd = pd.to_datetime(dtd, infer_datetime_format=True)
or
dtd = pd.to_datetime(dtd, format='%Y-%m-%d')
It returns the value error above. Interestingly, however, using the parse_dates and infer_datetime_format as arguments of the read_csv method work fine. What is going on here?
-
Jed over 7 yearsAwesome. Very simple. Thanks!
-
jezrael over 7 yearsGlad can help you!
-
Jed over 7 yearsSorry, I'm new here and didn't realize I needed to accept it. However, I'm still curious what the reason was for the error. Any ideas?
-
Jed over 7 yearshow does this work? I don't understand the logic of the operation
-
jezrael over 7 yearsI think problem is
to_datetime
needSeries
in old versions of pandas, in newer I get errorAttributeError: 'numpy.int64' object has no attribute 'lower'
, because it need minimal 3 column withyear
,month
andday
- see first example into_datetime
. -
piRSquared over 7 years@Dorian821 also notice that jezrael's answer uses
apply
which takes each column of thedtd
and usespd.to_datetime
. This works because each column is apd.Series
and is perfect for usingpd.to_datetime
. -
jezrael over 7 years@piRSquared - Thank you for better explanation.
-
Jed over 7 yearsahh, ok. Thanks a lot for the explanation.
-
jezrael over 7 years@piRSquared - I think you can add your comment to answer ;)
-
LonelySoul almost 3 yearsGenerally, Errors = 'coerce' should be last choice since the error may be due to "NULL" values