Pandas: Subtracting two date columns and the result being an integer
Solution 1
How about:
df_test['Difference'] = (df_test['First_Date'] - df_test['Second Date']).dt.days
This will return difference as int
if there are no missing values(NaT
) and float
if there is.
Pandas have a rich documentation on Time series / date functionality and Time deltas
Solution 2
You can divide column of dtype
timedelta
by np.timedelta64(1, 'D')
, but output is not int
, but float
, because NaN
values:
df_test['Difference'] = df_test['Difference'] / np.timedelta64(1, 'D')
print (df_test)
First_Date Second Date Difference
0 2016-02-09 2015-11-19 82.0
1 2016-01-06 2015-11-30 37.0
2 NaT 2015-12-04 NaN
3 2016-01-06 2015-12-08 29.0
4 NaT 2015-12-09 NaN
5 2016-01-07 2015-12-11 27.0
6 NaT 2015-12-12 NaN
7 NaT 2015-12-14 NaN
8 2016-01-06 2015-12-14 23.0
9 NaT 2015-12-15 NaN
Solution 3
You can use datetime module to help here. Also, as a side note, a simple date subtraction should work as below:
import datetime as dt
import numpy as np
import pandas as pd
#Assume we have df_test:
In [222]: df_test
Out[222]:
first_date second_date
0 2016-01-31 2015-11-19
1 2016-02-29 2015-11-20
2 2016-03-31 2015-11-21
3 2016-04-30 2015-11-22
4 2016-05-31 2015-11-23
5 2016-06-30 2015-11-24
6 NaT 2015-11-25
7 NaT 2015-11-26
8 2016-01-31 2015-11-27
9 NaT 2015-11-28
10 NaT 2015-11-29
11 NaT 2015-11-30
12 2016-04-30 2015-12-01
13 NaT 2015-12-02
14 NaT 2015-12-03
15 2016-04-30 2015-12-04
16 NaT 2015-12-05
17 NaT 2015-12-06
In [223]: df_test['Difference'] = df_test['first_date'] - df_test['second_date']
In [224]: df_test
Out[224]:
first_date second_date Difference
0 2016-01-31 2015-11-19 73 days
1 2016-02-29 2015-11-20 101 days
2 2016-03-31 2015-11-21 131 days
3 2016-04-30 2015-11-22 160 days
4 2016-05-31 2015-11-23 190 days
5 2016-06-30 2015-11-24 219 days
6 NaT 2015-11-25 NaT
7 NaT 2015-11-26 NaT
8 2016-01-31 2015-11-27 65 days
9 NaT 2015-11-28 NaT
10 NaT 2015-11-29 NaT
11 NaT 2015-11-30 NaT
12 2016-04-30 2015-12-01 151 days
13 NaT 2015-12-02 NaT
14 NaT 2015-12-03 NaT
15 2016-04-30 2015-12-04 148 days
16 NaT 2015-12-05 NaT
17 NaT 2015-12-06 NaT
Now, change type to datetime.timedelta, and then use the .days method on valid timedelta objects.
In [226]: df_test['Diffference'] = df_test['Difference'].astype(dt.timedelta).map(lambda x: np.nan if pd.isnull(x) else x.days)
In [227]: df_test
Out[227]:
first_date second_date Difference Diffference
0 2016-01-31 2015-11-19 73 days 73
1 2016-02-29 2015-11-20 101 days 101
2 2016-03-31 2015-11-21 131 days 131
3 2016-04-30 2015-11-22 160 days 160
4 2016-05-31 2015-11-23 190 days 190
5 2016-06-30 2015-11-24 219 days 219
6 NaT 2015-11-25 NaT NaN
7 NaT 2015-11-26 NaT NaN
8 2016-01-31 2015-11-27 65 days 65
9 NaT 2015-11-28 NaT NaN
10 NaT 2015-11-29 NaT NaN
11 NaT 2015-11-30 NaT NaN
12 2016-04-30 2015-12-01 151 days 151
13 NaT 2015-12-02 NaT NaN
14 NaT 2015-12-03 NaT NaN
15 2016-04-30 2015-12-04 148 days 148
16 NaT 2015-12-05 NaT NaN
17 NaT 2015-12-06 NaT NaN
Hope that helps.
Solution 4
I feel that the overall answer does not handle if the dates 'wrap' around a year. This would be useful in understanding proximity to a date being accurate by day of year. In order to do these row operations, I did the following. (I had this used in a business setting in renewing customer subscriptions).
def get_date_difference(row, x, y):
try:
# Calcuating the smallest date difference between the start and the close date
# There's some tricky logic in here to calculate for determining date difference
# the other way around (Dec -> Jan is 1 month rather than 11)
sub_start_date = int(row[x].strftime('%j')) # day of year (1-366)
close_date = int(row[y].strftime('%j')) # day of year (1-366)
later_date_of_year = max(sub_start_date, close_date)
earlier_date_of_year = min(sub_start_date, close_date)
days_diff = later_date_of_year - earlier_date_of_year
# Calculates the difference going across the next year (December -> Jan)
days_diff_reversed = (365 - later_date_of_year) + earlier_date_of_year
return min(days_diff, days_diff_reversed)
except ValueError:
return None
Then the function could be:
dfAC_Renew['date_difference'] = dfAC_Renew.apply(get_date_difference, x = 'customer_since_date', y = 'renewal_date', axis = 1)
Kevin
Updated on August 21, 2021Comments
-
Kevin over 2 years
I have two columns in a Pandas data frame that are dates.
I am looking to subtract one column from another and the result being the difference in numbers of days as an integer.
A peek at the data:
df_test.head(10) Out[20]: First_Date Second Date 0 2016-02-09 2015-11-19 1 2016-01-06 2015-11-30 2 NaT 2015-12-04 3 2016-01-06 2015-12-08 4 NaT 2015-12-09 5 2016-01-07 2015-12-11 6 NaT 2015-12-12 7 NaT 2015-12-14 8 2016-01-06 2015-12-14 9 NaT 2015-12-15
I have created a new column successfully with the difference:
df_test['Difference'] = df_test['First_Date'].sub(df_test['Second Date'], axis=0) df_test.head() Out[22]: First_Date Second Date Difference 0 2016-02-09 2015-11-19 82 days 1 2016-01-06 2015-11-30 37 days 2 NaT 2015-12-04 NaT 3 2016-01-06 2015-12-08 29 days 4 NaT 2015-12-09 NaT
However I am unable to get a numeric version of the result:
df_test['Difference'] = df_test[['Difference']].apply(pd.to_numeric) df_test.head() Out[25]: First_Date Second Date Difference 0 2016-02-09 2015-11-19 7.084800e+15 1 2016-01-06 2015-11-30 3.196800e+15 2 NaT 2015-12-04 NaN 3 2016-01-06 2015-12-08 2.505600e+15 4 NaT 2015-12-09 NaN
-
jezrael almost 8 yearsYes, it is one possible solution, but I think it is not recommended approach, because output of column
Diffference
isobject
and next processing (adding, substraction...) is impossible. -
clocker almost 8 years@jesrael, there are other ways of doing, eg, your solution. However, adding/subtracting is not a problem with NaNs mixed in with int types in a column. They will be automatically be casted to float operations as needed.
-
Robert Beatty almost 5 yearsAgree with @AllenWang. This is the superior answer.
-
user2256085 over 4 years@ Make that at least 3 suggesting this be the accepted answer
-
Nitro over 3 yearsThis may have changed in recent versions. It works for me using
.days
now while.dt.days
throws an error -
Prayson W. Daniel over 3 yearsIt appears that it depends on the resulting value. If they are a datetime series then
.dt
is required. Can you check the results of expression. Is it a DataFrame or a Series? I am still trying to figure out when dt is required -
GenDemo almost 3 yearsthis seems to only work for days, not for weeks or years.