elegant way of convert a numpy array containing datetime.timedelta into seconds in python 2.7

python arrays loops datetime numpy

12,717

Solution 1

import numpy as np

helper = np.vectorize(lambda x: x.total_seconds())
dt_sec = helper(dt)

Solution 2

numpy has its own datetime and timedelta formats. Just use them ;).

Set-up for example:

import datetime
import numpy

times = numpy.array([datetime.timedelta(0, 1, 36000)])

Code:

times.astype("timedelta64[ms]").astype(int) / 1000
#>>> array([ 1.036])

Since people don't seem to realise that this is the best solution, here are some timings of a timedelta64 array vs a datetime.datetime array:

SETUP="
import datetime
import numpy

times = numpy.array([datetime.timedelta(0, 1, 36000)] * 100000)
numpy_times = times.astype('timedelta64[ms]')
"

python -m timeit -s "$SETUP" "numpy_times.astype(int) / 1000"
python -m timeit -s "$SETUP" "numpy.vectorize(lambda x: x.total_seconds())(times)"
python -m timeit -s "$SETUP" "[delta.total_seconds() for delta in times]"

Results:

100 loops, best of 3: 4.54 msec per loop
10 loops, best of 3: 99.5 msec per loop
10 loops, best of 3: 67.1 msec per loop

The initial translation will take about two times as much time as the vectorized expression, but each operation from then-on into perpetuity on that timedelta array will be about 20 times faster.

If you're never going to use those timedeltas again, consider asking yourself why you ever made the deltas (as opposed to timedelta64s) in the first place, and then use the numpy.vectorize expression. It's less native but for some reason it's faster.

Solution 3

A convenient and elegant way is using a pandas.Series and using the dt.total_seconds attribute:

import numpy as np
import pandas as pd

# create example datetime arrays
arr1 = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
arr2 = np.array(['2007-07-15', '2006-01-18', '2010-08-22'], dtype='datetime64')

# timedelta array
td = arr2 - arr1

# get total seconds
pd.Series(td).dt.total_seconds()

0    172800.0
1    432000.0
2    777600.0
dtype: float64

12,717

Author by

otmezger

Basking.io

Updated on July 23, 2022

Comments

otmezger almost 2 years
I have a numpy array called dt. Each element is of type datetime.timedelta. For example:
```
>>>dt[0]
datetime.timedelta(0, 1, 36000)
```
how can I convert dt into the array dt_sec which contains only seconds without looping? my current solution (which works, but I don't like it) is:
```
dt_sec = zeros((len(dt),1))
for i in range(0,len(dt),1):
    dt_sec[i] = dt[i].total_seconds()
```
I tried to use dt.total_seconds() but of course it didn't work. any idea on how to avoid this loop?

Thanks
wflynny over 10 years

Why not use x.seconds in the lambda? Also, if the array is a flat 1-D array, is map(lambda x: x.total_seconds(), dt) faster?
Veedrac over 10 years

numpy isn't doing anything behind the scenes in that. Heck, it'll probably be slower than a loop over a normal list.
prgao over 10 years

sure and true (would have to convert list to array in the end).
ccbunney over 10 years

I did not know about vectorize...what a useful function! Thanks!
CrepeGoat about 3 years

some links to go with this answer: Series.dt: pandas.pydata.org/docs/reference/api/pandas.Series.dt.html Series.dt.total_seconds: pandas.pydata.org/docs/reference/api/…