Python Pandas Series of Datetimes to Seconds Since the Epoch

19,744

Update:

In 0.15.0 Timedeltas became a full-fledged dtype.

So this becomes possible (as well as the methods below)

In [45]: s = Series(pd.timedelta_range('1 day',freq='1S',periods=5))                         

In [46]: s.dt.components
Out[46]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     1      0        0        0             0             0            0
1     1      0        0        1             0             0            0
2     1      0        0        2             0             0            0
3     1      0        0        3             0             0            0
4     1      0        0        4             0             0            0

In [47]: s.astype('timedelta64[s]')
Out[47]: 
0    86400
1    86401
2    86402
3    86403
4    86404
dtype: float64

Original Answer:

I see that you are on master (and 0.13 is coming out very shortly), so assuming you have numpy >= 1.7. Do this. See here for the docs (this is frequency conversion)

In [5]: df = DataFrame(dict(date = date_range('20130101',periods=10)))

In [6]: df
Out[6]: 
                 date
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 2013-01-04 00:00:00
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 2013-01-08 00:00:00
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00

In [7]: df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1)
Out[7]: 
0   15706 days, 02:00:00
1   15707 days, 02:00:00
2   15708 days, 02:00:00
3   15709 days, 02:00:00
4   15710 days, 02:00:00
5   15711 days, 02:00:00
6   15712 days, 02:00:00
7   15713 days, 02:00:00
8   15714 days, 02:00:00
9   15715 days, 02:00:00
Name: date, dtype: timedelta64[ns]

In [9]: (df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1)) / np.timedelta64(1,'s')
Out[9]: 
0    1357005600
1    1357092000
2    1357178400
3    1357264800
4    1357351200
5    1357437600
6    1357524000
7    1357610400
8    1357696800
9    1357783200
Name: date, dtype: float64

The contained values are np.timedelta64[ns] objects, they don't have the same methods as timedelta objects, so no total_seconds().

In [10]: s = (df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1))

In [11]: s[0]
Out[11]: numpy.timedelta64(1357005600000000000,'ns')

You can astype them to int, and you get back a ns unit.

In [12]: s[0].astype(int)
Out[12]: 1357005600000000000

You can do this as well (but only on an individual unit element).

In [18]: s[0].astype('timedelta64[s]')
Out[18]: numpy.timedelta64(1357005600,'s')
Share:
19,744

Related videos on Youtube

Chris
Author by

Chris

Updated on May 25, 2022

Comments

  • Chris
    Chris almost 2 years

    Following in the spirit of this answer, I attempted the following to convert a DataFrame column of datetimes to a column of seconds since the epoch.

    df['date'] = (df['date']+datetime.timedelta(hours=2)-datetime.datetime(1970,1,1))
    df['date'].map(lambda td:td.total_seconds())
    

    The second command causes the following error which I do not understand. Any thoughts on what might be going on here? I replaced map with apply and that didn't help matters.

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-99-7123e823f995> in <module>()
    ----> 1 df['date'].map(lambda td:td.total_seconds())
    
    /Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/pandas-0.12.0_937_gb55c790-py2.7-macosx-10.8-x86_64.egg/pandas/core/series.pyc in map(self, arg, na_action)
       1932             return self._constructor(new_values, index=self.index).__finalize__(self)
       1933         else:
    -> 1934             mapped = map_f(values, arg)
       1935             return self._constructor(mapped, index=self.index).__finalize__(self)
       1936 
    
    /Users/cpd/.virtualenvs/py27-ipython+pandas/lib/python2.7/site-packages/pandas-0.12.0_937_gb55c790-py2.7-macosx-10.8-x86_64.egg/pandas/lib.so in pandas.lib.map_infer (pandas/lib.c:43628)()
    
    <ipython-input-99-7123e823f995> in <lambda>(td)
    ----> 1 df['date'].map(lambda td:td.total_seconds())
    
    AttributeError: 'float' object has no attribute 'total_seconds'
    
    • Zeugma
      Zeugma over 10 years
      It seems the 'date' column may not be a datetime64 in the first place?
    • Abe
      Abe over 10 years
      Does the column contain any missing values? Missing values usually cause pandas Series to be cast to floats, causing weirdness when you try to interpret them as datetimes.
    • Chris
      Chris over 10 years
      @Abe Indeed you were right. Turns out there is some missing data in some of the records. Not what I anticipated... Ack.
  • Chris
    Chris over 10 years
    Perfect Jeff, thank you! Once I got rid of the rows with missing data, I could directly apply this to the new problem. ;-)
  • Jeff
    Jeff over 10 years
    you don't need to remove missing values (NaT); they will be returned as nan
  • TimStaley
    TimStaley about 10 years
    I don't have time right now to edit (and re-verify) the answer, but in fact version 0.13 now allows 'astype' operations to be applied to an entire Series. See pandas.pydata.org/pandas-docs/stable/…, or pandas.pydata.org/pandas-docs/stable/whatsnew.html and do a find-on-page for 'astype'.
  • soupault
    soupault almost 9 years
    I'm searching for a method to convert Series of Timedeltas to seconds for plotting. Your approach with / pd.Timedelta(seconds=1) is a nice one. Stiil, I wonder if there is a better method. What do you think?
  • Jeff
    Jeff almost 9 years
    this answer actually is pretty outdated; Timedeltas became fully fledged types in 0.15.0. I will add another method.