elegant way of convert a numpy array containing datetime.timedelta into seconds in python 2.7
Solution 1
import numpy as np
helper = np.vectorize(lambda x: x.total_seconds())
dt_sec = helper(dt)
Solution 2
numpy
has its own datetime
and timedelta
formats. Just use them ;).
Set-up for example:
import datetime
import numpy
times = numpy.array([datetime.timedelta(0, 1, 36000)])
Code:
times.astype("timedelta64[ms]").astype(int) / 1000
#>>> array([ 1.036])
Since people don't seem to realise that this is the best solution, here are some timings of a timedelta64
array vs a datetime.datetime
array:
SETUP="
import datetime
import numpy
times = numpy.array([datetime.timedelta(0, 1, 36000)] * 100000)
numpy_times = times.astype('timedelta64[ms]')
"
python -m timeit -s "$SETUP" "numpy_times.astype(int) / 1000"
python -m timeit -s "$SETUP" "numpy.vectorize(lambda x: x.total_seconds())(times)"
python -m timeit -s "$SETUP" "[delta.total_seconds() for delta in times]"
Results:
100 loops, best of 3: 4.54 msec per loop
10 loops, best of 3: 99.5 msec per loop
10 loops, best of 3: 67.1 msec per loop
The initial translation will take about two times as much time as the vectorized expression, but each operation from then-on into perpetuity on that timedelta
array will be about 20 times faster.
If you're never going to use those timedelta
s again, consider asking yourself why you ever made the deltas (as opposed to timedelta64
s) in the first place, and then use the numpy.vectorize
expression. It's less native but for some reason it's faster.
Solution 3
A convenient and elegant way is using a pandas.Series
and using the dt.total_seconds
attribute:
import numpy as np
import pandas as pd
# create example datetime arrays
arr1 = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
arr2 = np.array(['2007-07-15', '2006-01-18', '2010-08-22'], dtype='datetime64')
# timedelta array
td = arr2 - arr1
# get total seconds
pd.Series(td).dt.total_seconds()
0 172800.0
1 432000.0
2 777600.0
dtype: float64
Comments
-
otmezger almost 2 years
I have a numpy array called
dt
. Each element is of typedatetime.timedelta
. For example:>>>dt[0] datetime.timedelta(0, 1, 36000)
how can I convert
dt
into the arraydt_sec
which contains only seconds without looping? my current solution (which works, but I don't like it) is:dt_sec = zeros((len(dt),1)) for i in range(0,len(dt),1): dt_sec[i] = dt[i].total_seconds()
I tried to use
dt.total_seconds()
but of course it didn't work. any idea on how to avoid this loop?Thanks
-
wflynny over 10 yearsWhy not use
x.seconds
in thelambda
? Also, if the array is a flat 1-D array, ismap(lambda x: x.total_seconds(), dt)
faster? -
Veedrac over 10 years
numpy
isn't doing anything behind the scenes in that. Heck, it'll probably be slower than a loop over a normallist
. -
prgao over 10 yearssure and true (would have to convert list to array in the end).
-
ccbunney over 10 yearsI did not know about vectorize...what a useful function! Thanks!
-
CrepeGoat about 3 yearssome links to go with this answer:
Series.dt
: pandas.pydata.org/docs/reference/api/pandas.Series.dt.htmlSeries.dt.total_seconds
: pandas.pydata.org/docs/reference/api/…