Pandas 0.15 DataFrame: Remove or reset time portion of a datetime64

10,155

Solution 1

Pandas has a builtin function pd.datetools.normalize_date for that purpose:

df['date_col'] = df['date_col'].apply(pd.datetools.normalize_date)

It's implemented in Cython and does the following:

if PyDateTime_Check(dt):
    return dt.replace(hour=0, minute=0, second=0, microsecond=0)
elif PyDate_Check(dt):
    return datetime(dt.year, dt.month, dt.day)
else:
    raise TypeError('Unrecognized type: %s' % type(dt))

Solution 2

Use dt methods, which is vectorized to yield faster results.

# There are better ways of converting it in to datetime column. 
# Ignore those to keep it simple
data['date_column'] = pd.to_datetime(data['date_column'])
data['date_column'].dt.date

Solution 3

pd.datetools.normalize_date has been deprecated. Use df['date_col'] = df['date_col'].dt.normalize() instead.

See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.normalize.html

Solution 4

I can think of two ways, setting or assigning to a new column just the date() attribute, or calling replace on the datetime object and passing param hour=0, minute=0:

In [106]:
# example data
t = """datetime
2014-06-30 21:50:00"""
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
df
Out[106]:
             datetime
0 2014-06-30 21:50:00
In [107]:
# apply a lambda accessing just the date() attribute
df['datetime'] = df['datetime'].apply( lambda x: x.date() )
print(df)
# reset df
df = pd.read_csv(io.StringIO(t), parse_dates=[0])
# call replace with params hour=0, minute=0
df['datetime'] = df['datetime'].apply( lambda x: x.replace(hour=0, minute=0) )
df

     datetime
0  2014-06-30
Out[107]:
    datetime
0 2014-06-30

Solution 5

Since pd.datetools.normalize_date has been deprecated and you are working with the datetime64 data type, use:

df.your_date_col = df.your_date_col.apply(lambda x: x.replace(hour=0, minute=0, second=0, microsecond=0))

This way you don't need to convert to pandas datetime first. If it's already a pandas datetime, then see answer from Phil.

df.your_date_col = df.your_date_col.dt.normalize()
Share:
10,155

Related videos on Youtube

n4cer500
Author by

n4cer500

Updated on June 14, 2022

Comments

  • n4cer500
    n4cer500 almost 2 years

    I have imported a CSV file into a pandas DataFrame and have a datetime64 column with values such as:

    2014-06-30 21:50:00
    

    I simply want to either remove the time or set the time to midnight:

    2014-06-30 00:00:00 
    

    What is the easiest way of doing this?

    • EdChum
      EdChum over 9 years
      You could just set the column to just the date portion: df['datetime'] = df['datetime'].apply( lambda x: x.date() )
    • EdChum
      EdChum over 9 years
      Or you could do this: df['datetime'] = df['datetime'].apply( lambda x: x.replace(hour=0, minute=0) )
  • jpm
    jpm over 2 years
    Note @phil's answer with df['date_col'].dt.normalize() since pd.datetools.normalize_date has been deprecated and avoid the resulting error.