elegant way of convert a numpy array containing datetime.timedelta into seconds in python 2.7

12,717

Solution 1

import numpy as np

helper = np.vectorize(lambda x: x.total_seconds())
dt_sec = helper(dt)

Solution 2

numpy has its own datetime and timedelta formats. Just use them ;).

Set-up for example:

import datetime
import numpy

times = numpy.array([datetime.timedelta(0, 1, 36000)])

Code:

times.astype("timedelta64[ms]").astype(int) / 1000
#>>> array([ 1.036])

Since people don't seem to realise that this is the best solution, here are some timings of a timedelta64 array vs a datetime.datetime array:

SETUP="
import datetime
import numpy

times = numpy.array([datetime.timedelta(0, 1, 36000)] * 100000)
numpy_times = times.astype('timedelta64[ms]')
"

python -m timeit -s "$SETUP" "numpy_times.astype(int) / 1000"
python -m timeit -s "$SETUP" "numpy.vectorize(lambda x: x.total_seconds())(times)"
python -m timeit -s "$SETUP" "[delta.total_seconds() for delta in times]"

Results:

100 loops, best of 3: 4.54 msec per loop
10 loops, best of 3: 99.5 msec per loop
10 loops, best of 3: 67.1 msec per loop

The initial translation will take about two times as much time as the vectorized expression, but each operation from then-on into perpetuity on that timedelta array will be about 20 times faster.


If you're never going to use those timedeltas again, consider asking yourself why you ever made the deltas (as opposed to timedelta64s) in the first place, and then use the numpy.vectorize expression. It's less native but for some reason it's faster.

Solution 3

A convenient and elegant way is using a pandas.Series and using the dt.total_seconds attribute:

import numpy as np
import pandas as pd

# create example datetime arrays
arr1 = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
arr2 = np.array(['2007-07-15', '2006-01-18', '2010-08-22'], dtype='datetime64')

# timedelta array
td = arr2 - arr1

# get total seconds
pd.Series(td).dt.total_seconds()
0    172800.0
1    432000.0
2    777600.0
dtype: float64
Share:
12,717
otmezger
Author by

otmezger

Basking.io

Updated on July 23, 2022

Comments

  • otmezger
    otmezger almost 2 years

    I have a numpy array called dt. Each element is of type datetime.timedelta. For example:

    >>>dt[0]
    datetime.timedelta(0, 1, 36000)
    

    how can I convert dt into the array dt_sec which contains only seconds without looping? my current solution (which works, but I don't like it) is:

    dt_sec = zeros((len(dt),1))
    for i in range(0,len(dt),1):
        dt_sec[i] = dt[i].total_seconds()
    

    I tried to use dt.total_seconds() but of course it didn't work. any idea on how to avoid this loop?

    Thanks

  • wflynny
    wflynny over 10 years
    Why not use x.seconds in the lambda? Also, if the array is a flat 1-D array, is map(lambda x: x.total_seconds(), dt) faster?
  • Veedrac
    Veedrac over 10 years
    numpy isn't doing anything behind the scenes in that. Heck, it'll probably be slower than a loop over a normal list.
  • prgao
    prgao over 10 years
    sure and true (would have to convert list to array in the end).
  • ccbunney
    ccbunney over 10 years
    I did not know about vectorize...what a useful function! Thanks!
  • CrepeGoat
    CrepeGoat about 3 years
    some links to go with this answer: Series.dt: pandas.pydata.org/docs/reference/api/pandas.Series.dt.html Series.dt.total_seconds: pandas.pydata.org/docs/reference/api/…