Keep only date part when using pandas.to_datetime

590,259

Solution 1

Since version 0.15.0 this can now be easily done using .dt to access just the date component:

df['just_date'] = df['dates'].dt.date

The above returns a datetime.date dtype, if you want to have a datetime64 then you can just normalize the time component to midnight so it sets all the values to 00:00:00:

df['normalised_date'] = df['dates'].dt.normalize()

This keeps the dtype as datetime64, but the display shows just the date value.

Solution 2

Simple Solution:

df['date_only'] = df['date_time_column'].dt.date

Solution 3

While I upvoted EdChum's answer, which is the most direct answer to the question the OP posed, it does not really solve the performance problem (it still relies on python datetime objects, and hence any operation on them will be not vectorized - that is, it will be slow).

A better performing alternative is to use df['dates'].dt.floor('d'). Strictly speaking, it does not "keep only date part", since it just sets the time to 00:00:00. But it does work as desired by the OP when, for instance:

  • printing to screen
  • saving to csv
  • using the column to groupby

... and it is much more efficient, since the operation is vectorized.

EDIT: in fact, the answer the OP's would have preferred is probably "recent versions of pandas do not write the time to csv if it is 00:00:00 for all observations".

Solution 4

Pandas v0.13+: Use to_csv with date_format parameter

Avoid, where possible, converting your datetime64[ns] series to an object dtype series of datetime.date objects. The latter, often constructed using pd.Series.dt.date, is stored as an array of pointers and is inefficient relative to a pure NumPy-based series.

Since your concern is format when writing to CSV, just use the date_format parameter of to_csv. For example:

df.to_csv(filename, date_format='%Y-%m-%d')

See Python's strftime directives for formatting conventions.

Solution 5

Pandas DatetimeIndex and Series have a method called normalize that does exactly what you want.

You can read more about it in this answer.

It can be used as ser.dt.normalize()

Share:
590,259
Admin
Author by

Admin

Updated on August 06, 2021

Comments

  • Admin
    Admin almost 3 years

    I use pandas.to_datetime to parse the dates in my data. Pandas by default represents the dates with datetime64[ns] even though the dates are all daily only. I wonder whether there is an elegant/clever way to convert the dates to datetime.date or datetime64[D] so that, when I write the data to CSV, the dates are not appended with 00:00:00. I know I can convert the type manually element-by-element:

    [dt.to_datetime().date() for dt in df.dates]
    

    But this is really slow since I have many rows and it sort of defeats the purpose of using pandas.to_datetime. Is there a way to convert the dtype of the entire column at once? Or alternatively, does pandas.to_datetime support a precision specification so that I can get rid of the time part while working with daily data?

  • Stewbaca
    Stewbaca almost 8 years
    If you are using astype('M8[D]') it transforms missing values into the origin date, 1970-1-1. Probably better to just use pandas.to_datetime() nowadays.
  • IanS
    IanS almost 7 years
    Unfortunately to_json still writes the full 00:00:00.
  • Pietro Battiston
    Pietro Battiston almost 7 years
    @IanS do you mean when using date_format='iso'?! By default, it just outputs seconds since epoch.
  • IanS
    IanS almost 7 years
    Yes, that's what I meant.
  • C8H10N4O2
    C8H10N4O2 almost 7 years
    This is faster than dt.normalize() on series longer than a few hundred elements.
  • Nic Scozzaro
    Nic Scozzaro about 6 years
    The OP is already using the .date() method in their question, so this solution doesn't answer their question, but I did find it useful to see a simple example of using the date() method just as a reference.
  • yeliabsalohcin
    yeliabsalohcin almost 5 years
    Note to anyone who routinely includes the datetime module as dt, this answer snipet will overwrite that module! @Dale-Jung, perhaps could change the line to something like dt_index
  • yeliabsalohcin
    yeliabsalohcin almost 5 years
    I'm also finding an issue whereby the next time I try and add a new row via the df.loc[date] method, the index reverts back to a timestamp, meaning subsequent comparisons no longer work
  • misantroop
    misantroop about 4 years
    Just a warning, this changes the type to object. So you'd need to astype('datetime64') to keep consistency.
  • RaphX
    RaphX almost 4 years
    This for some reason fails after you apply any aggregate function with the column.
  • fantabolous
    fantabolous over 3 years
    On a df of >1 million rows, this was far faster than any of the other options I found, with the exception of normalize which was only slightly slower. In addition, keeping the date pandas-native means it can be saved to hdf stores (datetime.date columns cannot be as of pandas 1.1.4).
  • Jean_N
    Jean_N over 3 years
    This is the most appropriate answer
  • rdmtinez
    rdmtinez about 3 years
    this doesn't seem to work whne used as a lambda function for the date parser i.e. date_parser=lambda col: pd.to_datetime(col, format="%y%m%d").date()
  • Rrptm
    Rrptm about 3 years
    @edChum any way to format date in MM-DD-YYYY format?
  • Tushar Tiwari
    Tushar Tiwari almost 3 years
    @Katekarin if you try and see df['Date'].values its will still have the time component present in it.
  • Joe Huang
    Joe Huang almost 3 years
    Can only use .dt accessor with datetimelike values