Converting different date time formats to MM/DD/YYYY format in pandas dataframe
Solution 1
Given your sample data, with the addition of a NaT
, this works:
Code:
df.date.apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y')[0])
Test Code:
import pandas as pd
df = pd.DataFrame([
[['']],
[['May 23rd, 2011']],
[['January 1st, 2010']],
[['Apr. 15, 2008']],
[['07-11-2013']],
[['9/01/1995']],
[['04/15/2000']],
[['11/22/68']],
[['12/1997']],
[['08/2002']],
[['2014']],
[['2016']],
], columns=['date'])
df['clean_date'] = df.date.apply(
lambda x: pd.to_datetime(x).strftime('%m/%d/%Y')[0])
print(df)
Results:
date clean_date
0 [] NaT
1 [May 23rd, 2011] 05/23/2011
2 [January 1st, 2010] 01/01/2010
3 [Apr. 15, 2008] 04/15/2008
4 [07-11-2013] 07/11/2013
5 [9/01/1995] 09/01/1995
6 [04/15/2000] 04/15/2000
7 [11/22/68] 11/22/1968
8 [12/1997] 12/01/1997
9 [08/2002] 08/01/2002
10 [2014] 01/01/2014
11 [2016] 01/01/2016
Solution 2
It would be better if you use this it'll give you the date format in MM-DD-YYYY the you can apply strftime:
df['Date_ColumnName'] = pd.to_datetime(df['Date_ColumnName'], dayfirst = False, yearfirst = False)
Chris T.
Updated on June 22, 2022Comments
-
Chris T. almost 2 years
I have a date column in a
pandas.DataFrame
in various date time formats and stored as list object, like the following:date 1 [May 23rd, 2011] 2 [January 1st, 2010] ... 99 [Apr. 15, 2008] 100 [07-11-2013] ... 256 [9/01/1995] 257 [04/15/2000] 258 [11/22/68] ... 360 [12/1997] 361 [08/2002] ... 463 [2014] 464 [2016]
For the sake of convenience, I want to convert them all to
MM/DD/YYYY
format. It doesn't seem possible to use regex replace() function to do this, since one cannot execute this operation over list objects. Also, to use strptime() for each cell will be too time-consuming.What will be the easier way to convert them all to the desired
MM/DD/YYYY
format? I found it very hard to do this on list objects within a dataframe.Note: for cell values of the form
[YYYY]
(e.g.,[2014]
and[2016]
), I will assume they are the first day of that year (i.e., January 1, 1968) and for cell values such as[08/2002]
(or[8/2002]
), I will assume they the first day of the month of that year (i.e., August 1, 2002).-
Jon Rose over 6 yearswhy would strptime() for each cell will be too time-consuming? how large is your list? Doesn't it only need to run once?
-
John Zwinck over 6 yearsHow close do you get with just
pd.to_datetime(df.date)
? -
Chris T. over 6 yearsMy dataframe is only about 500 rows. It's manageable, but there are certainly better ways to convert all these formats in one go.
-
Chris T. over 6 yearsIt says "<class 'list'> is not convertible to datetime" when I applied pd.to_datetime(df.date) because each row in the dataframe is a list object.
-
Zero over 6 yearsDoes
df.date.apply(lambda x: pd.to_datetime(x).strftime('%m/%d/%Y'))
work? -
Chris T. over 6 yearsNo, it does not work. It says "NaTType does not support strftime."
-
-
Chris T. over 6 yearsThanks for helping on this. I suppose this solves my problem, although I could not apply this really straightforward code to my own data (the data is a pandas series contains hundred of list objects like the one shown in the example data). I thought I could have converted that to dataframe and apply you code, but it says 'Series' object has no attribute 'to_DataFrame'
-
Stephen Rauch over 6 yearsSo this code works on a panda.Series of lists, each list only having one item. The expression
df.date.apply
does an apply to the Series which is in the columndate
. If you have only a Series and not a DataFrame you can should be able to domy_series.apply(....)
. -
Chris T. over 6 yearsYes, this code does work on pandas dataframe with list objects. I am trying to think of a way to convert my series object to dataframe, so that I can apply your code. When I used my_series.apply(), I got this following error message "index 0 is out of bounds for axis 0 with size 0"
-
Stephen Rauch over 6 yearsYou do not need to convert. In fact this code converts the column of a DataFrame to a Series. That is what
df.date
does. If takes one column nameddate
and makes it a series. So as I tried to indicate in my previous comment, take this code, and changedf.date
to be the name of your Series. Should just work. -
Chris T. over 6 yearsYes, I did, but I received the same error message "index 0 is out of bounds for axis 0 with size 0."
-
Stephen Rauch over 6 yearsWell it appears that there is something else non-uniform about the data in your Series.
-
Chris T. over 6 yearsI couldn't solve it. The dataset just doesn't make much sense to me. But the original data series does look every bit like the example data shown in this thread.
-
Stephen Rauch over 5 years@KhalilAlHooti, I just tested with that string and it works fine for me. I got
08/24/2014
-
Khalil Al Hooti over 5 years@StephenRauch Thank you it worked finally with me. not sure what was the problem!!
-
vikrant rana almost 5 years@Stephen Rauch. This one works great. how we can apply same function to date columns of same dataframe. for example; process_date & order_date. Do we need to specify or call this lambda function twice?