Pandas reverse of diff()

11,357

Solution 1

You can do this via numpy. Algorithm courtesy of @Divakar.

Of course, you need to know the first item in your series for this to work.

df = pd.DataFrame({'A': np.random.randint(0, 10, 10)})
df['B'] = df['A'].diff()

x, x_diff = df['A'].iloc[0], df['B'].iloc[1:]
df['C'] = np.r_[x, x_diff].cumsum().astype(int)

#    A    B  C
# 0  8  NaN  8
# 1  5 -3.0  5
# 2  4 -1.0  4
# 3  3 -1.0  3
# 4  9  6.0  9
# 5  7 -2.0  7
# 6  4 -3.0  4
# 7  0 -4.0  0
# 8  8  8.0  8
# 9  1 -7.0  1

Solution 2

You can use diff_inv from pmdarima.Docs link

# genarating random table
  np.random.seed(10)
  vals = np.random.randint(1, 10, 6)
  df_t = pd.DataFrame({"a":vals})

  #creating two columns with diff 1 and diff 2
  df_t['dif_1'] = df_t.a.diff(1)
  df_t['dif_2'] = df_t.a.diff(2)

  df_t

    a   dif_1   dif_2
  0 5   NaN     NaN
  1 1   -4.0    NaN
  2 2   1.0    -3.0
  3 1   -1.0    0.0
  4 2   1.0     0.0
  5 9   7.0     8.0

Then create a function that will return an array with inverse values of diff.

from pmdarima.utils import diff_inv

def inv_diff (df_orig_column,df_diff_column, periods):
    # Generate np.array for the diff_inv function - it includes first n values(n = 
    # periods) of original data & further diff values of given periods
    value = np.array(df_orig_column[:periods].tolist()+df_diff_column[periods:].tolist())

    # Generate np.array with inverse diff
    inv_diff_vals = diff_inv(value, periods,1 )[periods:]
    return inv_diff_vals

Example of Use:

# df_orig_column - column with original values
# df_diff_column - column with differentiated values
# periods - preiods for pd.diff()
inv_diff(df_t.a, df_t.dif_2, 2) 

Output:

array([5., 1., 2., 1., 2., 9.])

Solution 3

Reverse diff in one line with pandas

import pandas as pd

df = pd.DataFrame([10, 15, 14, 18], columns = ['Age'])
df['Age_diff'] = df.Age.diff()

df['reverse_diff'] = df['Age'].shift(1) + df['Age_diff']

print(df)

    Age  Age_diff  reverse_diff
0   10       NaN           NaN
1   15       5.0          15.0
2   14      -1.0          14.0
3   18       4.0          18.0  

Solution 4

Here's a working example.

First, let's import needed packages

import numpy as np
import pandas as pd

import pmdarima as pm

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

Then, let's create a simple discretized cosine wave

period = 5
cycles = 7
x = np.cos(np.linspace(0, 2*np.pi*cycles, periods*cycles+1))
X = pd.DataFrame(x)

and plot

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(X, marker='.')
ax.set(
    xticks=X.index
)
ax.axvline(0, color='r', ls='--')
ax.axvline(period, color='r', ls='--')
ax.set(
    title='Original data'
)
plt.show()

enter image description here

Note that the period is 5. Let's now remove this "seasonality" by differentiating with period 5

X_diff = X.diff(periods=period)
# NOTE: the first `period` observations
#       are needed for back transformation
X_diff.iloc[:period] = X[:period]

Note that we have to keep the first period observations to allow back transformation. If you don't need them you have to keep them elsewhere and then concatenate when you want to back transform.

fig, ax = plt.subplots(figsize=(12, 5))
ax.axvline(0, color='r', ls='--')
ax.axvline(period-1, color='r', ls='--')
ax.plot(X_diff, marker='.')
ax.annotate(
    'Keep these original data\nto allow back transformation',
    xy=(period-1, .5), xytext=(10, .5),
    arrowprops=dict(color='k')
)
ax.set(
    title='Transformed data'
)
plt.show()

enter image description here

Let's now back transform data with pmdarima.utils.diff_inv

X_diff_inv = pm.utils.diff_inv(X_diff, lag=period)[period:]

Note that we discard the first period results that would be 0 and not needed.

fig, ax = plt.subplots(figsize=(12, 5))
ax.axvline(0, color='r', ls='--')
ax.axvline(period-1, color='r', ls='--')
ax.plot(X_diff_inv, marker='.')
ax.set(
    title='Back transformed data'
)
plt.show()

enter image description here

Share:
11,357
Anesh
Author by

Anesh

Updated on August 01, 2022

Comments

  • Anesh
    Anesh almost 2 years

    I have calculated the differences between consecutive values in a series, but I cannot reverse / undifference them using diffinv():

    ds_sqrt = np.sqrt(ds)
    ds_sqrt = pd.DataFrame(ds_sqrt)
    ds_diff = ds_sqrt.diff().values
    

    How can I undifference this?

  • bluesmonk
    bluesmonk over 2 years
    I'm getting different plots using your code