Keep only date part when using pandas.to_datetime

python pandas csv datetime series

590,259

Solution 1

Since version 0.15.0 this can now be easily done using .dt to access just the date component:

df['just_date'] = df['dates'].dt.date

The above returns a datetime.date dtype, if you want to have a datetime64 then you can just normalize the time component to midnight so it sets all the values to 00:00:00:

df['normalised_date'] = df['dates'].dt.normalize()

This keeps the dtype as datetime64, but the display shows just the date value.

Solution 2

Simple Solution:

df['date_only'] = df['date_time_column'].dt.date

Solution 3

While I upvoted EdChum's answer, which is the most direct answer to the question the OP posed, it does not really solve the performance problem (it still relies on python datetime objects, and hence any operation on them will be not vectorized - that is, it will be slow).

A better performing alternative is to use df['dates'].dt.floor('d'). Strictly speaking, it does not "keep only date part", since it just sets the time to 00:00:00. But it does work as desired by the OP when, for instance:

printing to screen
saving to csv
using the column to groupby

... and it is much more efficient, since the operation is vectorized.

EDIT: in fact, the answer the OP's would have preferred is probably "recent versions of pandas do not write the time to csv if it is 00:00:00 for all observations".

Solution 4

Pandas v0.13+: Use `to_csv` with `date_format` parameter

Avoid, where possible, converting your datetime64[ns] series to an object dtype series of datetime.date objects. The latter, often constructed using pd.Series.dt.date, is stored as an array of pointers and is inefficient relative to a pure NumPy-based series.

Since your concern is format when writing to CSV, just use the date_format parameter of to_csv. For example:

df.to_csv(filename, date_format='%Y-%m-%d')

See Python's strftime directives for formatting conventions.

Solution 5

Pandas DatetimeIndex and Series have a method called normalize that does exactly what you want.

You can read more about it in this answer.

It can be used as ser.dt.normalize()

View more solutions

590,259

Author by

Admin

Updated on August 06, 2021

Comments

Admin almost 3 years
I use pandas.to_datetime to parse the dates in my data. Pandas by default represents the dates with datetime64[ns] even though the dates are all daily only. I wonder whether there is an elegant/clever way to convert the dates to datetime.date or datetime64[D] so that, when I write the data to CSV, the dates are not appended with 00:00:00. I know I can convert the type manually element-by-element:
```
[dt.to_datetime().date() for dt in df.dates]
```
But this is really slow since I have many rows and it sort of defeats the purpose of using pandas.to_datetime. Is there a way to convert the dtype of the entire column at once? Or alternatively, does pandas.to_datetime support a precision specification so that I can get rid of the time part while working with daily data?
Stewbaca almost 8 years

If you are using astype('M8[D]') it transforms missing values into the origin date, 1970-1-1. Probably better to just use pandas.to_datetime() nowadays.
IanS almost 7 years

Unfortunately to_json still writes the full 00:00:00.
Pietro Battiston almost 7 years

@IanS do you mean when using date_format='iso'?! By default, it just outputs seconds since epoch.
IanS almost 7 years

Yes, that's what I meant.
C8H10N4O2 almost 7 years

This is faster than dt.normalize() on series longer than a few hundred elements.
Nic Scozzaro about 6 years

The OP is already using the .date() method in their question, so this solution doesn't answer their question, but I did find it useful to see a simple example of using the date() method just as a reference.
yeliabsalohcin almost 5 years

Note to anyone who routinely includes the datetime module as dt, this answer snipet will overwrite that module! @Dale-Jung, perhaps could change the line to something like dt_index
yeliabsalohcin almost 5 years

I'm also finding an issue whereby the next time I try and add a new row via the df.loc[date] method, the index reverts back to a timestamp, meaning subsequent comparisons no longer work
misantroop about 4 years

Just a warning, this changes the type to object. So you'd need to astype('datetime64') to keep consistency.
RaphX almost 4 years

This for some reason fails after you apply any aggregate function with the column.
fantabolous over 3 years

On a df of >1 million rows, this was far faster than any of the other options I found, with the exception of normalize which was only slightly slower. In addition, keeping the date pandas-native means it can be saved to hdf stores (datetime.date columns cannot be as of pandas 1.1.4).
Jean_N over 3 years

This is the most appropriate answer
rdmtinez about 3 years

this doesn't seem to work whne used as a lambda function for the date parser i.e. date_parser=lambda col: pd.to_datetime(col, format="%y%m%d").date()
Rrptm about 3 years

@edChum any way to format date in MM-DD-YYYY format?
Tushar Tiwari almost 3 years

@Katekarin if you try and see df['Date'].values its will still have the time component present in it.
Joe Huang almost 3 years

Can only use .dt accessor with datetimelike values