Python pandas convert datetime to timestamp effectively through dt accessor
Solution 1
I think you need convert first to numpy array
by values
and cast to int64
- output is in ns
, so need divide by 10 ** 9
:
df['ts'] = df.datetime.values.astype(np.int64) // 10 ** 9
print (df)
datetime ts
0 2016-01-01 00:00:01 1451606401
1 2016-01-01 01:00:01 1451610001
2 2016-01-01 02:00:01 1451613601
3 2016-01-01 03:00:01 1451617201
4 2016-01-01 04:00:01 1451620801
5 2016-01-01 05:00:01 1451624401
6 2016-01-01 06:00:01 1451628001
7 2016-01-01 07:00:01 1451631601
8 2016-01-01 08:00:01 1451635201
9 2016-01-01 09:00:01 1451638801
10 2016-01-01 10:00:01 1451642401
11 2016-01-01 11:00:01 1451646001
12 2016-01-01 12:00:01 1451649601
13 2016-01-01 13:00:01 1451653201
14 2016-01-01 14:00:01 1451656801
15 2016-01-01 15:00:01 1451660401
16 2016-01-01 16:00:01 1451664001
17 2016-01-01 17:00:01 1451667601
18 2016-01-01 18:00:01 1451671201
19 2016-01-01 19:00:01 1451674801
20 2016-01-01 20:00:01 1451678401
21 2016-01-01 21:00:01 1451682001
22 2016-01-01 22:00:01 1451685601
23 2016-01-01 23:00:01 1451689201
24 2016-01-02 00:00:01 1451692801
to_timestamp
is used for converting from period to datetime index.
Solution 2
I think you should not use apply,
simply astype
would be fine:
df['ts'] = df.datetime.astype('int64') // 10**9
Solution 3
There's also another method to do this using the "hidden" attribute of DatetimeIndex
called asi8
, which creates an integer timestamp.
pd.DatetimeIndex(df.datetime).asi8
Wes McKinney suggested it in this tangentially related stackoverflow question linked here
Solution 4
If you don't want to use numpy you can use pure pandas conversions
df['ts'] = pd.to_timedelta(df['datetime'], unit='ns').dt.total_seconds().astype(int)
Solution 5
One option would be to use a lambda expressions like such
df['datetime'] = df['datetime'].apply(lambda x: pd.Timestamp(x))
Comments
-
ragesz over 2 years
I have a dataframe with some (hundreds of) million of rows. And I want to convert datetime to timestamp effectively. How can I do it?
My sample
df
:df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1), end=dt.datetime(2016,1,2,0,0,1), freq='H'))\ .reset_index().rename(columns={'index':'datetime'}) df.head() datetime 0 2016-01-01 00:00:01 1 2016-01-01 01:00:01 2 2016-01-01 02:00:01 3 2016-01-01 03:00:01 4 2016-01-01 04:00:01
Now I convert datetime to timestamp value-by-value with
.apply()
but it takes a very long time (some hours) if I have some (hundreds of) million rows:df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int) df.head() datetime ts 0 2016-01-01 00:00:01 1451602801 1 2016-01-01 01:00:01 1451606401 2 2016-01-01 02:00:01 1451610001 3 2016-01-01 03:00:01 1451613601 4 2016-01-01 04:00:01 1451617201
The above result is what I want.
If I try to use the
.dt
accessor ofpandas.Series
then I get error message:df['ts'] = df['datetime'].dt.timestamp
AttributeError: 'DatetimeProperties' object has no attribute 'timestamp'
If I try to create eg. the date parts of datetimes with the
.dt
accessor then it is much more faster then using.apply()
:df['date'] = df['datetime'].dt.date df.head() datetime ts date 0 2016-01-01 00:00:01 1451602801 2016-01-01 1 2016-01-01 01:00:01 1451606401 2016-01-01 2 2016-01-01 02:00:01 1451610001 2016-01-01 3 2016-01-01 03:00:01 1451613601 2016-01-01 4 2016-01-01 04:00:01 1451617201 2016-01-01
I want something similar with timestamps...
But I don't really understand the official documentation: it talks about "Converting to Timestamps" but I don't see any timestamps there; it just talks about converting to datetime with
pd.to_datetime()
but not to timestamp...pandas.Timestamp
constructor also doesn't work (returns with the below error):df['ts2'] = pd.Timestamp(df['datetime'])
TypeError: Cannot convert input to Timestamp
pandas.Series.to_timestamp
also makes something totally different that I want:df['ts3'] = df['datetime'].to_timestamp df.head() datetime ts ts3 0 2016-01-01 00:00:01 1451602801 <bound method Series.to_timestamp of 0 2016... 1 2016-01-01 01:00:01 1451606401 <bound method Series.to_timestamp of 0 2016... 2 2016-01-01 02:00:01 1451610001 <bound method Series.to_timestamp of 0 2016... 3 2016-01-01 03:00:01 1451613601 <bound method Series.to_timestamp of 0 2016... 4 2016-01-01 04:00:01 1451617201 <bound method Series.to_timestamp of 0 2016...
Thank you!!