Keep only date part when using pandas.to_datetime
Solution 1
Since version 0.15.0
this can now be easily done using .dt
to access just the date component:
df['just_date'] = df['dates'].dt.date
The above returns a datetime.date
dtype, if you want to have a datetime64
then you can just normalize
the time component to midnight so it sets all the values to 00:00:00
:
df['normalised_date'] = df['dates'].dt.normalize()
This keeps the dtype
as datetime64
, but the display shows just the date
value.
Solution 2
Simple Solution:
df['date_only'] = df['date_time_column'].dt.date
Solution 3
While I upvoted EdChum's answer, which is the most direct answer to the question the OP posed, it does not really solve the performance problem (it still relies on python datetime
objects, and hence any operation on them will be not vectorized - that is, it will be slow).
A better performing alternative is to use df['dates'].dt.floor('d')
. Strictly speaking, it does not "keep only date part", since it just sets the time to 00:00:00
. But it does work as desired by the OP when, for instance:
- printing to screen
- saving to csv
- using the column to
groupby
... and it is much more efficient, since the operation is vectorized.
EDIT: in fact, the answer the OP's would have preferred is probably "recent versions of pandas
do not write the time to csv if it is 00:00:00
for all observations".
Solution 4
Pandas v0.13+: Use to_csv
with date_format
parameter
Avoid, where possible, converting your datetime64[ns]
series to an object
dtype series of datetime.date
objects. The latter, often constructed using pd.Series.dt.date
, is stored as an array of pointers and is inefficient relative to a pure NumPy-based series.
Since your concern is format when writing to CSV, just use the date_format
parameter of to_csv
. For example:
df.to_csv(filename, date_format='%Y-%m-%d')
See Python's strftime
directives for formatting conventions.
Solution 5
Pandas DatetimeIndex
and Series
have a method called normalize
that does exactly what you want.
You can read more about it in this answer.
It can be used as ser.dt.normalize()
Admin
Updated on August 06, 2021Comments
-
Admin almost 3 years
I use
pandas.to_datetime
to parse the dates in my data. Pandas by default represents the dates withdatetime64[ns]
even though the dates are all daily only. I wonder whether there is an elegant/clever way to convert the dates todatetime.date
ordatetime64[D]
so that, when I write the data to CSV, the dates are not appended with00:00:00
. I know I can convert the type manually element-by-element:[dt.to_datetime().date() for dt in df.dates]
But this is really slow since I have many rows and it sort of defeats the purpose of using
pandas.to_datetime
. Is there a way to convert thedtype
of the entire column at once? Or alternatively, doespandas.to_datetime
support a precision specification so that I can get rid of the time part while working with daily data? -
Stewbaca almost 8 yearsIf you are using astype('M8[D]') it transforms missing values into the origin date, 1970-1-1. Probably better to just use pandas.to_datetime() nowadays.
-
IanS almost 7 yearsUnfortunately
to_json
still writes the full00:00:00
. -
Pietro Battiston almost 7 years@IanS do you mean when using
date_format='iso'
?! By default, it just outputs seconds since epoch. -
IanS almost 7 yearsYes, that's what I meant.
-
C8H10N4O2 almost 7 yearsThis is faster than
dt.normalize()
on series longer than a few hundred elements. -
Nic Scozzaro about 6 yearsThe OP is already using the .date() method in their question, so this solution doesn't answer their question, but I did find it useful to see a simple example of using the date() method just as a reference.
-
yeliabsalohcin almost 5 yearsNote to anyone who routinely includes the datetime module as
dt
, this answer snipet will overwrite that module! @Dale-Jung, perhaps could change the line to something like dt_index -
yeliabsalohcin almost 5 yearsI'm also finding an issue whereby the next time I try and add a new row via the
df.loc[date]
method, the index reverts back to a timestamp, meaning subsequent comparisons no longer work -
misantroop about 4 yearsJust a warning, this changes the type to object. So you'd need to astype('datetime64') to keep consistency.
-
RaphX almost 4 yearsThis for some reason fails after you apply any aggregate function with the column.
-
fantabolous over 3 yearsOn a df of >1 million rows, this was far faster than any of the other options I found, with the exception of
normalize
which was only slightly slower. In addition, keeping the date pandas-native means it can be saved to hdf stores (datetime.date columns cannot be as of pandas 1.1.4). -
Jean_N over 3 yearsThis is the most appropriate answer
-
rdmtinez about 3 yearsthis doesn't seem to work whne used as a lambda function for the date parser i.e. date_parser=lambda col: pd.to_datetime(col, format="%y%m%d").date()
-
Rrptm about 3 years@edChum any way to format date in MM-DD-YYYY format?
-
Tushar Tiwari almost 3 years@Katekarin if you try and see df['Date'].values its will still have the time component present in it.
-
Joe Huang almost 3 yearsCan only use .dt accessor with datetimelike values