From TimeDelta to float days in Pandas

28,129

Solution 1

You can use pd.to_timedelta or np.timedelta64 to define a duration and divide by this:

# set up as per @EdChum
df['total_days_td'] = df['time_delta'] / pd.to_timedelta(1, unit='D')
df['total_days_td'] = df['time_delta'] / np.timedelta64(1, 'D')

Solution 2

You can use dt.total_seconds and divide this by the total number of seconds in a day, example:

In [25]:
df = pd.DataFrame({'dates':pd.date_range(dt.datetime(2016,1,1, 12,15,3), periods=10)})
df

Out[25]:
                dates
0 2016-01-01 12:15:03
1 2016-01-02 12:15:03
2 2016-01-03 12:15:03
3 2016-01-04 12:15:03
4 2016-01-05 12:15:03
5 2016-01-06 12:15:03
6 2016-01-07 12:15:03
7 2016-01-08 12:15:03
8 2016-01-09 12:15:03
9 2016-01-10 12:15:03

In [26]:
df['time_delta'] = df['dates'] - pd.datetime(2015,11,6,8,10)
df

Out[26]:
                dates       time_delta
0 2016-01-01 12:15:03 56 days 04:05:03
1 2016-01-02 12:15:03 57 days 04:05:03
2 2016-01-03 12:15:03 58 days 04:05:03
3 2016-01-04 12:15:03 59 days 04:05:03
4 2016-01-05 12:15:03 60 days 04:05:03
5 2016-01-06 12:15:03 61 days 04:05:03
6 2016-01-07 12:15:03 62 days 04:05:03
7 2016-01-08 12:15:03 63 days 04:05:03
8 2016-01-09 12:15:03 64 days 04:05:03
9 2016-01-10 12:15:03 65 days 04:05:03

In [27]:
df['total_days_td'] = df['time_delta'].dt.total_seconds() / (24 * 60 * 60)
df

Out[27]:
                dates       time_delta  total_days_td
0 2016-01-01 12:15:03 56 days 04:05:03      56.170174
1 2016-01-02 12:15:03 57 days 04:05:03      57.170174
2 2016-01-03 12:15:03 58 days 04:05:03      58.170174
3 2016-01-04 12:15:03 59 days 04:05:03      59.170174
4 2016-01-05 12:15:03 60 days 04:05:03      60.170174
5 2016-01-06 12:15:03 61 days 04:05:03      61.170174
6 2016-01-07 12:15:03 62 days 04:05:03      62.170174
7 2016-01-08 12:15:03 63 days 04:05:03      63.170174
8 2016-01-09 12:15:03 64 days 04:05:03      64.170174
9 2016-01-10 12:15:03 65 days 04:05:03      65.170174
Share:
28,129

Related videos on Youtube

alpagarou
Author by

alpagarou

Engineer in bioinformatics and modeling. Very pluridisciplinary field, so I know a bit of many topics but nothing deeply. I've done an internship on metagenomics data and one in swarming robotics. I now work on modeling population dynamics of tsetse flies to evaluate control strategies.

Updated on November 04, 2020

Comments

  • alpagarou
    alpagarou over 3 years

    I have a TimeDelta column with values that look like this:

    2 days 21:54:00.000000000

    I would like to have a float representing the number of days, let's say here 2+21/24 = 2.875, neglecting the minutes. Is there a simple way to do this ? I saw an answer suggesting

    res['Ecart_lacher_collecte'].apply(lambda x: float(x.item().days+x.item().hours/24.))
    

    But I get "AttributeError: 'str' object has no attribute 'item' "

    Numpy version is '1.10.4' Pandas version is u'0.17.1'

    The columns has originally been obtained with:

    lac['DateHeureLacher'] = pd.to_datetime(lac['Date lacher']+' '+lac['Heure lacher'],format='%d/%m/%Y %H:%M:%S')
    cap['DateCollecte'] = pd.to_datetime(cap['Date de collecte']+' '+cap['Heure de collecte'],format='%d/%m/%Y %H:%M:%S')
    

    in a first script. Then in a second one:

    res = pd.merge(lac, cap, how='inner', on=['Loc'])
    res['DateHeureLacher']  = pd.to_datetime(res['DateHeureLacher'],format='%Y-%m-%d %H:%M:%S')
    res['DateCollecte']  = pd.to_datetime(res['DateCollecte'],format='%Y-%m-%d %H:%M:%S')
    res['Ecart_lacher_collecte'] = res['DateCollecte'] - res['DateHeureLacher']
    

    Maybe saving it to csv change their types back to string? The transformation I'm trying to do is in a third script.

    Sexe_x  PiegeLacher latL    longL   Loc Col_x   DateHeureLacher Nb envolees PiegeCapture    latC    longC   Col_y   Sexe_y  Effectif    DateCollecte    DatePose    Ecart_lacher_collecte   Dist_m
    M   Q0-002  1629238 237877  H   Rouge   2011-02-04 17:15:00 928 Q0-002  1629238 237877  Rouge   M   1   2011-02-07 15:09:00 2011-02-07 12:14:00 2 days 21:54:00.000000000   0
    M   Q0-002  1629238 237877  H   Rouge   2011-02-04 17:15:00 928 Q0-002  1629238 237877  Rouge   M   4   2011-02-07 12:14:00 2011-02-07 09:42:00 2 days 18:59:00.000000000   0
    M   Q0-002  1629238 237877  H   Rouge   2011-02-04 17:15:00 928 Q0-003  1629244 237950  Rouge   M   1   2011-02-07 15:10:00 2011-02-07 12:16:00 2 days 21:55:00.000000000   75
    

    res.info():

    Sexe_x                   922 non-null object
    PiegeLacher              922 non-null object
    latL                     922 non-null int64
    longL                    922 non-null int64
    Loc                      922 non-null object
    Col_x                    922 non-null object
    DateHeureLacher          922 non-null object
    Nb envolees              922 non-null int64
    PiegeCapture             922 non-null object
    latC                     922 non-null int64
    longC                    922 non-null int64
    Col_y                    922 non-null object
    Sexe_y                   922 non-null object
    Effectif                 922 non-null int64
    DateCollecte             922 non-null object
    DatePose                 922 non-null object
    Ecart_lacher_collecte    922 non-null object
    Dist_m                   922 non-null int64
    
    • Anton Protopopov
      Anton Protopopov about 8 years
      I think you could use float(res['columnName'].dt.days + res['columnName'].dt.hours / 24)
    • EdChum
      EdChum about 8 years
      IIUC you can do res['columnname'].dt.total_seconds()/ (24 * 60 * 60)
    • alpagarou
      alpagarou about 8 years
      @AntonProtopopov I get "AttributeError: Can only use .dt accessor with datetimelike values"
    • alpagarou
      alpagarou about 8 years
      @EdChum I get "AttributeError: 'Series' object has no attribute 'total_seconds' "
    • Anton Protopopov
      Anton Protopopov about 8 years
      @alpagarou first you need to convert you columns to datetime object. You could do that with pd.to_datetime
    • EdChum
      EdChum about 8 years
      What's your version of pandas and can you post raw data, code and your numpy version and edit this into your question
    • alpagarou
      alpagarou about 8 years
      @AntonProtopopov pd.to_datetime raises "ValueError: Unknown string format"
    • Anton Protopopov
      Anton Protopopov about 8 years
      @alpagarou post your date please. Without that it's hard to say what could cause the problem
    • EdChum
      EdChum about 8 years
      Are you saying your column is a string of timedeltas?
    • EdChum
      EdChum about 8 years
      post output from res.info(), also does res['Ecart_lacher_collecte'].dt.total_seconds()/ (24 * 60 * 60) work?
    • alpagarou
      alpagarou about 8 years
      @EdChum First I get "AttributeError: Can only use .dt accessor with datetimelike values" but when I try "res['Ecart_lacher_collecte'] = pd.to_datetime(res['Ecart_lacher_collecte'])" I get "ValueError: Unknown string format"
    • EdChum
      EdChum about 8 years
      It looks you have invalid values in your columns, can you try pd.to_datetime(res['DateHeureLacher'],format='%Y-%m-%d %H:%M:%S', errors='coerce') also you need to post minimal data and code that we can run that reproduces your error as this should work, also when you read the csv back in and set parse_dates=['DateHeureLacher', 'DateCollecte'] in read_csv does it fix the dtypes?
    • alpagarou
      alpagarou about 8 years
      parse_dates fixes the type of 'DateHeureLacher' and 'DateCollecte' but 'Ecart_lacher_collecte' which was timedelta64[ns] at the end of the precedent script, goes back to object when I read it from csv (I also tried to include it in the parse_dates but it doesn't work) I should probably use errors='coerce' to convert 'Ecart_lacher_collecte to datetime before using dt.days, but what format should I precise?
    • EdChum
      EdChum about 8 years
      OK, I just tried this and found that you can't specify the dtype in read_csv to parse it back as a timedelta64, so you have a couple options, convert the timedelta64 to int64 using astype and then when reading it back use a custom converter to convert it or after reading it back in you can do astype(np.timedelta64) and this will work
    • alpagarou
      alpagarou about 8 years
      I put my last two scripts in one and managed to it. However, I get a lot of SettingWithCopyWarning ?
  • Jingwei Yu
    Jingwei Yu over 7 years
    yes it works but I'm shocked they don't have a dt.total_days() or dt.total_hours available...
  • EdChum
    EdChum over 7 years
    @LedgerYu because it's not a standard method for timedelta, additionally it's trivial to calculate this yourself but besides if we followed your logic we'd then have methods for total_days, total_hours, total_quarters, total_years, total_milliseconds etc.
  • Guillochon
    Guillochon almost 6 years
    You can just do dt.days, not as a function. Works for me!
  • Taylor
    Taylor over 5 years
    @Guillochon dt.days will only give you the number of days and will throw out all the time information
  • Elias Hasle
    Elias Hasle over 5 years
    Why separate the division and then join it like that? The only difference from x.total_seconds()/(3600*24) is that incomplete hours don't count at all, which I doubt is the desired behavior.
  • sharinganSawant
    sharinganSawant over 5 years
    Yes, that's right. the incomplete hours aren't counted. That's because the original poster wanted it that way: 'I would like to have a float representing the number of days, let's say here 2+21/24 = 2.875, neglecting the minutes.'
  • sharinganSawant
    sharinganSawant over 5 years
    @EliasHasle I have edited the answer where I have mentioned why the incomplete hours weren't accounted for.
  • MarMat
    MarMat almost 4 years
    You can also use the datetime module, e.g., datetime.timedelta(days=1)