Add column with number of days between dates in DataFrame pandas
Solution 1
Assuming these were datetime columns (if they're not apply to_datetime
) you can just subtract them:
df['A'] = pd.to_datetime(df['A'])
df['B'] = pd.to_datetime(df['B'])
In [11]: df.dtypes # if already datetime64 you don't need to use to_datetime
Out[11]:
A datetime64[ns]
B datetime64[ns]
dtype: object
In [12]: df['A'] - df['B']
Out[12]:
one -58 days
two -26 days
dtype: timedelta64[ns]
In [13]: df['C'] = df['A'] - df['B']
In [14]: df
Out[14]:
A B C
one 2014-01-01 2014-02-28 -58 days
two 2014-02-03 2014-03-01 -26 days
Note: ensure you're using a new of pandas (e.g. 0.13.1), this may not work in older versions.
Solution 2
To remove the 'days' text element, you can also make use of the dt() accessor for series: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.html
So,
df[['A','B']] = df[['A','B']].apply(pd.to_datetime) #if conversion required
df['C'] = (df['B'] - df['A']).dt.days
which returns:
A B C
one 2014-01-01 2014-02-28 58
two 2014-02-03 2014-03-01 26
Solution 3
A list comprehension is your best bet for the most Pythonic (and fastest) way to do this:
[int(i.days) for i in (df.B - df.A)]
- i will return the timedelta(e.g. '-58 days')
- i.days will return this value as a long integer value(e.g. -58L)
- int(i.days) will give you the -58 you seek.
If your columns aren't in datetime format. The shorter syntax would be: df.A = pd.to_datetime(df.A)
Solution 4
How about this:
times['days_since'] = max(list(df.index.values))
times['days_since'] = times['days_since'] - times['months']
times
Jase Villam
Updated on November 28, 2020Comments
-
Jase Villam over 3 years
I want to subtract dates in 'A' from dates in 'B' and add a new column with the difference.
df A B one 2014-01-01 2014-02-28 two 2014-02-03 2014-03-01
I've tried the following, but get an error when I try to include this in a for loop...
import datetime date1=df['A'][0] date2=df['B'][0] mdate1 = datetime.datetime.strptime(date1, "%Y-%m-%d").date() rdate1 = datetime.datetime.strptime(date2, "%Y-%m-%d").date() delta = (mdate1 - rdate1).days print delta
What should I do?
-
0nir over 9 yearsDo can we get rid of the "days" portion in the result incase we just need to see the numeric value ie. -58, -26 in this case.
-
dirkjot over 8 yearsto expand on @AndyHayden comment, that works but it should
pd.offsets.Day(1)
(with an 's'). I also usually negate it, so you get(df['A'] - df['B']) / pd.offsets.Day(-1)
-
dirkjot over 8 yearsHowever, if you want to do this on a whole Series you need
(df['A'] - df['B']) / np.timedelta64(-1, 'D')
for reasons that I don't fully understand. -
Andy Hayden over 8 years@dirkjot Thanks for spotting the typo! IIRC this was fix in recent pandas, are you using 0.16.2 / 0.17?
-
webelo about 7 yearsI found that this was a bit buggy when there was missing data. The problems are that 1) the missing data doesn't have an
.isnull()
attribute and 2) it has a.day
attribute but the non-missing data has a.days
attribute. So after creating the new variable I ran a loop over eachobs
ervation that checks:if hasattr(obs,'days')
then assignobs.days
and else assignnp.nan
. -
Andy Hayden about 7 years@webelo the DatetimeIndex/Series itself should have a
.dt.days
attribute which should be strongly preferred. -
Samuel Nde almost 6 yearsGreat answer. In my case,
df['C'] = (df['B'] - df['A']).dt.days
did not work and I had to usedf['C'] = (df['B'] - df['A']).days
. Any idea why mine did not give the number of days as expected? -
Ricky McMaster almost 6 yearsNde - how exactly did it not work? Error or wrong values? Did you convert both A and B columns to datetime successfully?
-
Samuel Nde almost 6 yearsBoth my columns are datetime (or
datetime64[ns]
to be precise). When I diddf['C'] = (df['B'] - df['A']).dt.days
, I got an attribute error that said AttributeError: 'Timedelta' object has no attribute 'dt', so I tried df['C'] = (df['B'] - df['A']).days which gave me the desired answer. (Of course I am using my own dataframe not the one in the example above. Or could it be because I also have time in my date and not as in2018-09-24 10:17:18.800277
) -
PV8 about 5 yearsHow does this function rounds, if my original dataframe is with Hours:Minutes:Seconds?
-
user3065757 almost 5 yearsperfect answer.
-
Rodrigo Hjort over 4 yearsGreat solution. Thanks!
-
int soumen almost 3 yearsIn case we have a constant date on one hand and a series on another hand we can do
d0 = pd.to_datetime(date(2017, 2, 28))
d1 = df.Date
delta = d1 - d0