calculate datetime-difference in years, months, etc. in a new pandas dataframe column

46,831

Solution 1

With a simple function you can reach your goal.

The function calculates the years difference and the months difference with a simple calculation.

import pandas as pd
import datetime

def parse_date(td):
    resYear = float(td.days)/364.0                   # get the number of years including the the numbers after the dot
    resMonth = int((resYear - int(resYear))*364/30)  # get the number of months, by multiply the number after the dot by 364 and divide by 30.
    resYear = int(resYear)
    return str(resYear) + "Y" + str(resMonth) + "m"

df = pd.DataFrame([("2000-01-10", "1970-04-29")], columns=["start", "end"])
df["delta"] = [parse_date(datetime.datetime.strptime(start, '%Y-%m-%d') - datetime.datetime.strptime(end, '%Y-%m-%d')) for start, end in zip(df["start"], df["end"])]
print df

        start         end  delta
0  2000-01-10  1970-04-29  29Y9m

Solution 2

You can try by creating a new column with years in this way:

df['diff_year'] = df['diff'] / np.timedelta64(1, 'Y')

Solution 3

Pretty much straightforward with relativedelta:

from dateutil import relativedelta

>>          end      start
>> 0 1970-04-29 2000-01-10

for i in df.index:
    df.at[i, 'diff'] = relativedelta.relativedelta(df.ix[i, 'start'], df.ix[i, 'end'])

>>          end      start                                           diff
>> 0 1970-04-29 2000-01-10  relativedelta(years=+29, months=+8, days=+12)

Solution 4

A much simpler way is to use date_range function and calculate length of the same

startdt=pd.to_datetime('2017-01-01')
enddt = pd.to_datetime('2018-01-01')
len(pd.date_range(start=startdt,end=enddt,freq='M'))

Solution 5

I think this is the most 'pandas' way to do it, without using any for loops or defining external functions:

>>> df = pd.DataFrame({'Name': ['A'], 'start': [datetime(2000, 1, 10)], 'end': [datetime(1970, 4, 29)]})
>>> df['diff'] = map(lambda td: datetime(1, 1, 1) + td, list(df['start'] - df['end']))
>>> df['diff'] = df['diff'].apply(lambda d: '{0}y{1}m'.format(d.year - 1, d.month - 1))
>>> df
  Name        end      start   diff
0    A 1970-04-29 2000-01-10  29y8m

Had to use map instead of apply because of pandas' timedelda64, which doesn't allow a simple addition to a datetime object.

Share:
46,831
beta
Author by

beta

Updated on January 29, 2020

Comments

  • beta
    beta over 4 years

    I have a pandas dataframe looking like this:

    Name    start        end
    A       2000-01-10   1970-04-29
    

    I want to add a new column providing the difference between the start and end column in years, months, days.

    So the result should look like:

    Name    start        end          diff
    A       2000-01-10   1970-04-29   29y9m etc.
    

    the diff column may also be a datetime object or a timedelta object, but the key point for me is, that I can easily get the Year and Month out of it.

    What I tried until now is:

    df['diff'] = df['end'] - df['start']
    

    This results in the new column containing 10848 days. However, I do not know how to convert the days to 29y9m etc.