How to get slope from timeseries data in pandas?

17,247

Solution 1

I will take part of Franco answer, but you don't need sklearn. You can easily do it with scipy.

import datetime as dt
from scipy import stats

df = pd.DataFrame(list, columns=['date', 'value'])
df.date =pd.to_datetime(df.date)
df['date_ordinal'] = pd.to_datetime(df['date']).map(dt.datetime.toordinal)
slope, intercept, r_value, p_value, std_err = stats.linregress(df['date_ordinal'], df['value'])

slope
Out[114]: 0.80959404761905

Solution 2

You can use datetime.toordinal to map each date to an integer and sklearn.linear_model to fit a linear regression model on your data to obtain the slope like:

import datetime as dt
from sklearn import linear_model

df = pd.DataFrame(list, columns=['date', 'value'])
df['date_ordinal'] = pd.to_datetime(df['date']).map(dt.datetime.toordinal)
reg = linear_model.LinearRegression()
reg.fit(df['date_ordinal'].values.reshape(-1, 1), df['value'].values)

reg.coef_

array([0.80959405])

Solution 3

To get the slope and intercept of a linear regression line (y = intercept + slope * x) for a simple case like this, you need to use numpy polyfit() method. My explanation is inline with code below.

# You should only need numpy and pandas
import numpy as np
import pandas as pd

# Now your list 
list = [('2018-10-29', 6.1925), ('2018-10-29', 6.195), ('2018-10-29', 1.95833333333333), 
        ('2018-10-29', 1.785), ('2018-10-29', 3.05), ('2018-10-29', 1.30666666666667), 
        ('2018-10-29', 1.6325), ('2018-10-30', 1.765), ('2018-10-30', 1.265), 
        ('2018-10-30', 2.1125), ('2018-10-30', 2.16714285714286), ('2018-10-30', 1.485), 
        ('2018-10-30', 1.72), ('2018-10-30', 2.754), ('2018-10-30', 1.79666666666667), 
        ('2018-10-30', 1.27833333333333), ('2018-10-30', 3.48), ('2018-10-30', 6.19), 
        ('2018-10-30', 6.235), ('2018-10-30', 6.11857142857143), ('2018-10-30', 6.088), 
        ('2018-10-30', 4.3), ('2018-10-30', 7.80666666666667), 
        ('2018-10-30', 7.78333333333333), ('2018-10-30', 10.9766666666667), 
        ('2018-10-30', 2.19), ('2018-10-30', 1.88)]

# Create a single pandas DataFrame
df = pd.DataFrame(list)

# Make it into a Time Series with 'date' and 'value' columns
ts = pd.DataFrame(list, columns=['date', 'value'])

#print it to check it
ts.head(10)

# Now separate it into x and y lists

x = ts['date']
y = ts['value'].astype(float)

# Create a sequance of integers from 0 to x.size to use in np.polyfit() call
x_seq = np.arange(x.size) # should give you [ 0  1  2  3  4 ... 26]

# call numpy polyfit() method with x_seq, y 
fit = np.polyfit(x_seq, y, 1)
fit_fn = np.poly1d(fit)
print('Slope = ', fit[0], ", ","Intercept = ", fit[1])
print(fit_fn)

Slope = 0.1366855921855925 , Intercept = 1.9827865961199274

0.1367 x + 1.983

Share:
17,247

Related videos on Youtube

Souvik Ray
Author by

Souvik Ray

Updated on September 15, 2022

Comments

  • Souvik Ray
    Souvik Ray over 1 year

    I have a pandas dataframe which contains date and some values something like below

    Original data:

    list = [('2018-10-29', 6.1925), ('2018-10-29', 6.195), ('2018-10-29', 1.95833333333333), 
            ('2018-10-29', 1.785), ('2018-10-29', 3.05), ('2018-10-29', 1.30666666666667), 
            ('2018-10-29', 1.6325), ('2018-10-30', 1.765), ('2018-10-30', 1.265), 
            ('2018-10-30', 2.1125), ('2018-10-30', 2.16714285714286), ('2018-10-30', 1.485), 
            ('2018-10-30', 1.72), ('2018-10-30', 2.754), ('2018-10-30', 1.79666666666667), 
            ('2018-10-30', 1.27833333333333), ('2018-10-30', 3.48), ('2018-10-30', 6.19), 
            ('2018-10-30', 6.235), ('2018-10-30', 6.11857142857143), ('2018-10-30', 6.088), 
            ('2018-10-30', 4.3), ('2018-10-30', 7.80666666666667), 
            ('2018-10-30', 7.78333333333333), ('2018-10-30', 10.9766666666667), 
            ('2018-10-30', 2.19), ('2018-10-30', 1.88)]
    

    After loading into pandas

    df = pd.DataFrame(list)
    
    
                 0          1
    0   2018-10-29   6.192500
    1   2018-10-29   6.195000
    2   2018-10-29   1.958333
    3   2018-10-29   1.785000
    4   2018-10-29   3.050000
    5   2018-10-29   1.306667
    6   2018-10-29   1.632500
    7   2018-10-30   1.765000
    8   2018-10-30   1.265000
    9   2018-10-30   2.112500
    10  2018-10-30   2.167143
    11  2018-10-30   1.485000
    12  2018-10-30   1.720000
    13  2018-10-30   2.754000
    14  2018-10-30   1.796667
    15  2018-10-30   1.278333
    16  2018-10-30   3.480000
    17  2018-10-30   6.190000
    18  2018-10-30   6.235000
    19  2018-10-30   6.118571
    20  2018-10-30   6.088000
    21  2018-10-30   4.300000
    22  2018-10-30   7.806667
    23  2018-10-30   7.783333
    24  2018-10-30  10.976667
    25  2018-10-30   2.190000
    26  2018-10-30   1.880000
    

    This is how I load up the dataframe

    df = pd.DataFrame(list)
    df[0] = pd.to_datetime(df[0], errors='coerce')
    df.set_index(0, inplace=True)
    

    Now I want to find the slope. Upon research in the internet, I found this is what is needed to be done to get the slope

    trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values).trend.interpolate(method='linear',axis=0).fillna(0).values)))
    
    results = sm.OLS(np.asarray(sm.tsa.seasonal_decompose(df.iloc[:,0].values).trend.interpolate(method='linear', axis=0).fillna(0).values), sm.add_constant(np.array([i for i in range(len(trend_coord))])), missing='drop').fit()
    
    slope = results.params[1]
    print(slope)
    

    But I get the below error

    Traceback (most recent call last):
      File "/home/souvik/Music/UI_Server2/test35.py", line 11, in <module>
        trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values).trend.interpolate(method='linear',axis=0).fillna(0).values)))
      File "/home/souvik/django_test/webdev/lib/python3.5/site-packages/statsmodels/tsa/seasonal.py", line 127, in seasonal_decompose
        raise ValueError("You must specify a freq or x must be a "
    ValueError: You must specify a freq or x must be a pandas object with a timeseries index with a freq not set to None
    

    Now if I add a freq parameter to the seasonal_decompose method such as

    trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values, freq=1).trend.interpolate(method='linear',axis=0).fillna(0).values)))
    

    Then I get an error like

    Traceback (most recent call last):
      File "/home/souvik/Music/UI_Server2/test35.py", line 11, in <module>
        trend_coord = list(map(list, zip(df.index.strftime('%Y-%m-%d'), sm.tsa.seasonal_decompose(df.iloc[:,0].values, freq=1).trend.interpolate(method='linear',axis=0).fillna(0).values)))
    AttributeError: 'numpy.ndarray' object has no attribute 'interpolate'
    

    However if I get rid of any fine graining of data such as interpolate etc and do something like below

    trend_coord = sm.tsa.seasonal_decompose(df.iloc[:,0].values, freq=1, model='additive').trend
    
    results = sm.OLS(np.asarray(trend_coord),
                     sm.add_constant(np.array([i for i in range(len(trend_coord))])), missing='drop').fit()
    slope = results.params[1]
    print(">>>>>>>>>>>>>>>> slope", slope)
    

    Then I get a slope value of 0.13668559218559242.

    But I am not sure if this is the correct way to find out the slope and if even the value is right.

    Is there a better way to find out slope?

    • Patrick Artner
      Patrick Artner over 5 years
      dont call variables after built ins - you are shadowing them. It is a bad habit to use list,dict,tuple,set,max,min,abs,sum,... as variable names.
  • David García Bodego
    David García Bodego over 4 years
    Welcome to SO! When you give an answer, eve if it is right, try to explain it a little bit.
  • seedhom
    seedhom about 4 years
    BTW, the above answer is for python 3 only