Calculate local time derivative of Series

13,455

Use numpy.gradient

import numpy as np
import pandas as pd

slope = pd.Series(np.gradient(tmp.data), tmp.index, name='slope')

To address the unequal temporal index, i'd resample over minutes and interpolate. Then my gradients would be over equal intervals.

tmp_ = tmp.resample('T').interpolate()

slope = pd.Series(np.gradient(tmp_.data), tmp_.index, name='slope')

df = pd.concat([tmp_.rename('data'), slope], axis=1)
df

enter image description here

df.plot()

enter image description here

Share:
13,455
Adam
Author by

Adam

Updated on June 05, 2022

Comments

  • Adam
    Adam almost 2 years

    I have data that I'm importing from an hdf5 file. So, it comes in looking like this:

    import pandas as pd
    tmp=pd.Series([1.,3.,4.,3.,5.],['2016-06-27 23:52:00','2016-06-27 23:53:00','2016-06-27 23:54:00','2016-06-27 23:55:00','2016-06-27 23:59:00'])
    tmp.index=pd.to_datetime(tmp.index)
    
    >>>tmp
    2016-06-27 23:52:00    1.0
    2016-06-27 23:53:00    3.0
    2016-06-27 23:54:00    4.0
    2016-06-27 23:55:00    3.0
    2016-06-27 23:59:00    5.0
    dtype: float64
    

    I would like to find the local slope of the data. If I just do tmp.diff() I do get the local change in value. But, I want to get the change in value per second (time derivative) I would like to do something like this, but this is the wrong way to do it and gives an error:

    tmp.diff()/tmp.index.diff()
    

    I have figured out that I can do it by converting all the data to a DataFrame, but that seems inefficient. Especially, since I'm going to have to work with a large, on disk file in chunks. Is there a better way to do it other than this:

    df=pd.DataFrame(tmp)
    df['secvalue']=df.index.astype(np.int64)/1e+9
    df['slope']=df['Value'].diff()/df['secvalue'].diff()
    
  • Adam
    Adam over 7 years
    When I try to resample, on real data, I get a whole bunch of NaN. Even though the data is at about my resampling freq (for example real data is at about 15s and I resample at 15S). This does seem to work if instead I resample at a higher freq. Any suggestions? The other issue with this approach is that resampling is relatively slow.
  • piRSquared
    piRSquared over 7 years
    @Adam some sample data would be more helpful. If you can provide some in your question, I can take a look at your specific issue.
  • Adam
    Adam over 7 years
    I'm not sure of the etiquette, but the data is too long for a comment. So, I put some in pastebin: pastebin.com/vK59kN0e
  • piRSquared
    piRSquared over 7 years
    @Adam for freq '15S' you need to use an upsampling method like mean or last to get a value filled in. You can then interpolate. Try v.resample('15S').mean().interpolate().plot() See docs at pandas.pydata.org/pandas-docs/stable/timeseries.html#resampl‌​ing
  • Adam
    Adam over 7 years
    That makes some sense. My concern is that now I'm taking 15 sec data, and averaging and interpolating, which act to filter the data, when I really just want to find how extreme the actual changes are in the raw data.
  • piRSquared
    piRSquared over 7 years
    @Adam have you considered a rolling standard deviation? Or rolling absolute differences?
  • Adam
    Adam over 7 years
    But don't either of those have the same problem of not accounting for missing or uneven timestamps?
  • astrojuanlu
    astrojuanlu almost 7 years
    There is no need to resample the series anymore thanks to improvements in the gradient function in NumPy 1.13 docs.scipy.org/doc/numpy/… This answer deserves an edit.
  • JosiahJohnston
    JosiahJohnston over 4 years
    This was a very helpful hint to some problems I'm dealing with. To use modern numpy API: slope = pd.Series(np.gradient(tmp_.values, tmp_.index.astype('int64')//1e9), tmp_.index, name='slope'). np.gradient can deal with uneven intervals, but does need datetime cast into seconds or similar unit. The int64 trick is less readable but much faster than other ways of casting into seconds.
  • user2589273
    user2589273 over 2 years
    Side note: It is bad practice to call a column values as it conflicts with the pandas df.values function.