Changing time components of pandas datetime64 column

11,420

EDIT :

A vectorized way to do this would be to normalize the series, and then add 12 hours to it using timedelta. Example -

tempDF['date'].dt.normalize() + datetime.timedelta(hours=12)

Demo -

In [59]: tempDF
Out[59]:
                  date  id
0  2015-02-04 12:00:00   1
1  2015-06-04 12:00:00   2
2  2015-09-04 12:00:00   3
3  2015-12-04 12:00:00   4
4  2015-04-15 12:00:00   5
5  2015-04-21 12:00:00   6
6  2015-04-29 12:00:00   7
7  2015-04-05 12:00:00   8
8  2015-06-05 12:00:00   9
9  2015-10-05 12:00:00  10
10 2015-12-05 12:00:00  11
11 2015-05-19 12:00:00  12
12 2015-05-27 12:00:00  13
13 2015-01-06 12:00:00  14
14 2015-04-06 12:00:00  15
15 2015-10-06 12:00:00  16
16 2015-06-15 12:00:00  17
17 2015-06-19 12:00:00  18
18 2015-06-23 12:00:00  19
19 2015-06-27 12:00:00  20

In [60]: tempDF['date'].dt.normalize() + datetime.timedelta(hours=12)
Out[60]:
0    2015-02-04 12:00:00
1    2015-06-04 12:00:00
2    2015-09-04 12:00:00
3    2015-12-04 12:00:00
4    2015-04-15 12:00:00
5    2015-04-21 12:00:00
6    2015-04-29 12:00:00
7    2015-04-05 12:00:00
8    2015-06-05 12:00:00
9    2015-10-05 12:00:00
10   2015-12-05 12:00:00
11   2015-05-19 12:00:00
12   2015-05-27 12:00:00
13   2015-01-06 12:00:00
14   2015-04-06 12:00:00
15   2015-10-06 12:00:00
16   2015-06-15 12:00:00
17   2015-06-19 12:00:00
18   2015-06-23 12:00:00
19   2015-06-27 12:00:00
dtype: datetime64[ns]

Timing information for both methods at bottom


One method would be to use Series.apply along with the .replace() method OP mentions in his post. Example -

tempDF['date'] = tempDF['date'].apply(lambda x:x.replace(hour=12,minute=0))

Demo -

In [12]: tempDF
Out[12]:
                  date  id
0  2015-02-04 02:34:00   1
1  2015-06-04 12:34:00   2
2  2015-09-04 23:03:00   3
3  2015-12-04 01:00:00   4
4  2015-04-15 07:12:00   5
5  2015-04-21 12:59:00   6
6  2015-04-29 17:33:00   7
7  2015-04-05 10:44:00   8
8  2015-06-05 11:12:00   9
9  2015-10-05 08:52:00  10
10 2015-12-05 14:19:00  11
11 2015-05-19 19:22:00  12
12 2015-05-27 22:31:00  13
13 2015-01-06 11:09:00  14
14 2015-04-06 12:57:00  15
15 2015-10-06 04:00:00  16
16 2015-06-15 03:23:00  17
17 2015-06-19 05:37:00  18
18 2015-06-23 13:41:00  19
19 2015-06-27 15:43:00  20

In [13]: tempDF['date'] = tempDF['date'].apply(lambda x:x.replace(hour=12,minute=0))

In [14]: tempDF
Out[14]:
                  date  id
0  2015-02-04 12:00:00   1
1  2015-06-04 12:00:00   2
2  2015-09-04 12:00:00   3
3  2015-12-04 12:00:00   4
4  2015-04-15 12:00:00   5
5  2015-04-21 12:00:00   6
6  2015-04-29 12:00:00   7
7  2015-04-05 12:00:00   8
8  2015-06-05 12:00:00   9
9  2015-10-05 12:00:00  10
10 2015-12-05 12:00:00  11
11 2015-05-19 12:00:00  12
12 2015-05-27 12:00:00  13
13 2015-01-06 12:00:00  14
14 2015-04-06 12:00:00  15
15 2015-10-06 12:00:00  16
16 2015-06-15 12:00:00  17
17 2015-06-19 12:00:00  18
18 2015-06-23 12:00:00  19
19 2015-06-27 12:00:00  20

Timing information

In [52]: df = pd.DataFrame([[datetime.datetime.now()] for _ in range(100000)],columns=['date'])

In [54]: %%timeit
   ....: df['date'].dt.normalize() + datetime.timedelta(hours=12)
   ....:
The slowest run took 12.53 times longer than the fastest. This could mean that an intermediate result is being cached
1 loops, best of 3: 32.3 ms per loop

In [57]: %%timeit
   ....: df['date'].apply(lambda x:x.replace(hour=12,minute=0))
   ....:
1 loops, best of 3: 1.09 s per loop
Share:
11,420

Related videos on Youtube

user1718097
Author by

user1718097

Updated on September 14, 2022

Comments

  • user1718097
    user1718097 over 1 year

    I have a dataframe that can be simplified as:

                    date  id
    0   02/04/2015 02:34   1
    1   06/04/2015 12:34   2
    2   09/04/2015 23:03   3
    3   12/04/2015 01:00   4
    4   15/04/2015 07:12   5
    5   21/04/2015 12:59   6
    6   29/04/2015 17:33   7
    7   04/05/2015 10:44   8
    8   06/05/2015 11:12   9
    9   10/05/2015 08:52  10
    10  12/05/2015 14:19  11
    11  19/05/2015 19:22  12
    12  27/05/2015 22:31  13
    13  01/06/2015 11:09  14
    14  04/06/2015 12:57  15
    15  10/06/2015 04:00  16
    16  15/06/2015 03:23  17
    17  19/06/2015 05:37  18
    18  23/06/2015 13:41  19
    19  27/06/2015 15:43  20
    

    It can be created using:

    tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
                            'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"]})
    

    The data has the following types:

    tempDF.dtypes
    date     object
    id        int64
    dtype: object
    

    I have set the 'date' variable to be Pandas datefime64 format (if that's the right way to describe it) using:

    import numpy as np
    import pandas as pd
    tempDF['date'] = pd_to_datetime(tempDF['date'])
    

    So now, the dtypes look like:

    tempDF.dtypes
    date     datetime64[ns]
    id                int64
    dtype: object
    

    I want to change the hours of the original date data. I can use .normalize() to convert to midnight via the .dt accessor:

    tempDF['date'] = tempDF['date'].dt.normalize()
    

    And, I can get access to individual datetime components (e.g. year) using:

    tempDF['date'].dt.year
    

    This produces:

    0     2015
    1     2015
    2     2015
    3     2015
    4     2015
    5     2015
    6     2015
    7     2015
    8     2015
    9     2015
    10    2015
    11    2015
    12    2015
    13    2015
    14    2015
    15    2015
    16    2015
    17    2015
    18    2015
    19    2015
    Name: date, dtype: int64
    

    The question is, how can I change specific date and time components? For example, how could I change the midday (12:00) for all the dates? I've found that datetime.datetime has a .replace() function. However, having converted dates to Pandas format, it would make sense to keep in that format. Is there a way to do that without changing the format again?

  • user1718097
    user1718097 over 8 years
    Great answer. Thank you. I've always shied away from using lambda functions because the dataframes I have usually contain in excess in a million rows and I thought lambda functions would be slow. But, perhaps, I need to revisit these functions. Is there a way to do the same using a column-based method rather than stepping through each row?
  • Anand S Kumar
    Anand S Kumar over 8 years
    I was able to find a vectorized method, check it out, updated in the answer.
  • Arigion
    Arigion almost 3 years
    Adding a timedelta to Timestamps with a timezone and daylight saving might get you unexpected results. (pd.Timestamp('2022-03-27 00:00', tz='CET') + pd.Timedelta(12, unit='h')).hour == 13