Add column with number of days between dates in DataFrame pandas

246,226

Solution 1

Assuming these were datetime columns (if they're not apply to_datetime) you can just subtract them:

df['A'] = pd.to_datetime(df['A'])
df['B'] = pd.to_datetime(df['B'])

In [11]: df.dtypes  # if already datetime64 you don't need to use to_datetime
Out[11]:
A    datetime64[ns]
B    datetime64[ns]
dtype: object

In [12]: df['A'] - df['B']
Out[12]:
one   -58 days
two   -26 days
dtype: timedelta64[ns]

In [13]: df['C'] = df['A'] - df['B']

In [14]: df
Out[14]:
             A          B        C
one 2014-01-01 2014-02-28 -58 days
two 2014-02-03 2014-03-01 -26 days

Note: ensure you're using a new of pandas (e.g. 0.13.1), this may not work in older versions.

Solution 2

To remove the 'days' text element, you can also make use of the dt() accessor for series: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.html

So,

df[['A','B']] = df[['A','B']].apply(pd.to_datetime) #if conversion required
df['C'] = (df['B'] - df['A']).dt.days

which returns:

             A          B   C
one 2014-01-01 2014-02-28  58
two 2014-02-03 2014-03-01  26

Solution 3

A list comprehension is your best bet for the most Pythonic (and fastest) way to do this:

[int(i.days) for i in (df.B - df.A)]
  1. i will return the timedelta(e.g. '-58 days')
  2. i.days will return this value as a long integer value(e.g. -58L)
  3. int(i.days) will give you the -58 you seek.

If your columns aren't in datetime format. The shorter syntax would be: df.A = pd.to_datetime(df.A)

Solution 4

How about this:

times['days_since'] = max(list(df.index.values))  
times['days_since'] = times['days_since'] - times['months']  
times
Share:
246,226
Jase Villam
Author by

Jase Villam

Updated on November 28, 2020

Comments

  • Jase Villam
    Jase Villam over 3 years

    I want to subtract dates in 'A' from dates in 'B' and add a new column with the difference.

    df
              A        B
    one 2014-01-01  2014-02-28 
    two 2014-02-03  2014-03-01
    

    I've tried the following, but get an error when I try to include this in a for loop...

    import datetime
    date1=df['A'][0]
    date2=df['B'][0]
    mdate1 = datetime.datetime.strptime(date1, "%Y-%m-%d").date()
    rdate1 = datetime.datetime.strptime(date2, "%Y-%m-%d").date()
    delta =  (mdate1 - rdate1).days
    print delta
    

    What should I do?

  • 0nir
    0nir over 9 years
    Do can we get rid of the "days" portion in the result incase we just need to see the numeric value ie. -58, -26 in this case.
  • dirkjot
    dirkjot over 8 years
    to expand on @AndyHayden comment, that works but it should pd.offsets.Day(1) (with an 's'). I also usually negate it, so you get (df['A'] - df['B']) / pd.offsets.Day(-1)
  • dirkjot
    dirkjot over 8 years
    However, if you want to do this on a whole Series you need (df['A'] - df['B']) / np.timedelta64(-1, 'D') for reasons that I don't fully understand.
  • Andy Hayden
    Andy Hayden over 8 years
    @dirkjot Thanks for spotting the typo! IIRC this was fix in recent pandas, are you using 0.16.2 / 0.17?
  • webelo
    webelo about 7 years
    I found that this was a bit buggy when there was missing data. The problems are that 1) the missing data doesn't have an .isnull() attribute and 2) it has a .day attribute but the non-missing data has a .days attribute. So after creating the new variable I ran a loop over each observation that checks: if hasattr(obs,'days') then assign obs.days and else assign np.nan.
  • Andy Hayden
    Andy Hayden about 7 years
    @webelo the DatetimeIndex/Series itself should have a .dt.days attribute which should be strongly preferred.
  • Samuel Nde
    Samuel Nde almost 6 years
    Great answer. In my case, df['C'] = (df['B'] - df['A']).dt.days did not work and I had to use df['C'] = (df['B'] - df['A']).days. Any idea why mine did not give the number of days as expected?
  • Ricky McMaster
    Ricky McMaster almost 6 years
    Nde - how exactly did it not work? Error or wrong values? Did you convert both A and B columns to datetime successfully?
  • Samuel Nde
    Samuel Nde almost 6 years
    Both my columns are datetime (or datetime64[ns] to be precise). When I did df['C'] = (df['B'] - df['A']).dt.days, I got an attribute error that said AttributeError: 'Timedelta' object has no attribute 'dt', so I tried df['C'] = (df['B'] - df['A']).days which gave me the desired answer. (Of course I am using my own dataframe not the one in the example above. Or could it be because I also have time in my date and not as in 2018-09-24 10:17:18.800277)
  • PV8
    PV8 about 5 years
    How does this function rounds, if my original dataframe is with Hours:Minutes:Seconds?
  • user3065757
    user3065757 almost 5 years
    perfect answer.
  • Rodrigo Hjort
    Rodrigo Hjort over 4 years
    Great solution. Thanks!
  • int soumen
    int soumen almost 3 years
    In case we have a constant date on one hand and a series on another hand we can do d0 = pd.to_datetime(date(2017, 2, 28)) d1 = df.Date delta = d1 - d0