Pandas reverse of diff()
Solution 1
You can do this via numpy
. Algorithm courtesy of @Divakar.
Of course, you need to know the first item in your series for this to work.
df = pd.DataFrame({'A': np.random.randint(0, 10, 10)})
df['B'] = df['A'].diff()
x, x_diff = df['A'].iloc[0], df['B'].iloc[1:]
df['C'] = np.r_[x, x_diff].cumsum().astype(int)
# A B C
# 0 8 NaN 8
# 1 5 -3.0 5
# 2 4 -1.0 4
# 3 3 -1.0 3
# 4 9 6.0 9
# 5 7 -2.0 7
# 6 4 -3.0 4
# 7 0 -4.0 0
# 8 8 8.0 8
# 9 1 -7.0 1
Solution 2
You can use diff_inv from pmdarima.Docs link
# genarating random table
np.random.seed(10)
vals = np.random.randint(1, 10, 6)
df_t = pd.DataFrame({"a":vals})
#creating two columns with diff 1 and diff 2
df_t['dif_1'] = df_t.a.diff(1)
df_t['dif_2'] = df_t.a.diff(2)
df_t
a dif_1 dif_2
0 5 NaN NaN
1 1 -4.0 NaN
2 2 1.0 -3.0
3 1 -1.0 0.0
4 2 1.0 0.0
5 9 7.0 8.0
Then create a function that will return an array with inverse values of diff.
from pmdarima.utils import diff_inv
def inv_diff (df_orig_column,df_diff_column, periods):
# Generate np.array for the diff_inv function - it includes first n values(n =
# periods) of original data & further diff values of given periods
value = np.array(df_orig_column[:periods].tolist()+df_diff_column[periods:].tolist())
# Generate np.array with inverse diff
inv_diff_vals = diff_inv(value, periods,1 )[periods:]
return inv_diff_vals
Example of Use:
# df_orig_column - column with original values
# df_diff_column - column with differentiated values
# periods - preiods for pd.diff()
inv_diff(df_t.a, df_t.dif_2, 2)
Output:
array([5., 1., 2., 1., 2., 9.])
Solution 3
Reverse diff in one line with pandas
import pandas as pd
df = pd.DataFrame([10, 15, 14, 18], columns = ['Age'])
df['Age_diff'] = df.Age.diff()
df['reverse_diff'] = df['Age'].shift(1) + df['Age_diff']
print(df)
Age Age_diff reverse_diff
0 10 NaN NaN
1 15 5.0 15.0
2 14 -1.0 14.0
3 18 4.0 18.0
Solution 4
Here's a working example.
First, let's import needed packages
import numpy as np
import pandas as pd
import pmdarima as pm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
Then, let's create a simple discretized cosine wave
period = 5
cycles = 7
x = np.cos(np.linspace(0, 2*np.pi*cycles, periods*cycles+1))
X = pd.DataFrame(x)
and plot
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(X, marker='.')
ax.set(
xticks=X.index
)
ax.axvline(0, color='r', ls='--')
ax.axvline(period, color='r', ls='--')
ax.set(
title='Original data'
)
plt.show()
Note that the period is 5
. Let's now remove this "seasonality" by differentiating with period 5
X_diff = X.diff(periods=period)
# NOTE: the first `period` observations
# are needed for back transformation
X_diff.iloc[:period] = X[:period]
Note that we have to keep the first period
observations to allow back transformation. If you don't need them you have to keep them elsewhere and then concatenate when you want to back transform.
fig, ax = plt.subplots(figsize=(12, 5))
ax.axvline(0, color='r', ls='--')
ax.axvline(period-1, color='r', ls='--')
ax.plot(X_diff, marker='.')
ax.annotate(
'Keep these original data\nto allow back transformation',
xy=(period-1, .5), xytext=(10, .5),
arrowprops=dict(color='k')
)
ax.set(
title='Transformed data'
)
plt.show()
Let's now back transform data with pmdarima.utils.diff_inv
X_diff_inv = pm.utils.diff_inv(X_diff, lag=period)[period:]
Note that we discard the first period
results that would be 0
and not needed.
fig, ax = plt.subplots(figsize=(12, 5))
ax.axvline(0, color='r', ls='--')
ax.axvline(period-1, color='r', ls='--')
ax.plot(X_diff_inv, marker='.')
ax.set(
title='Back transformed data'
)
plt.show()
Anesh
Updated on August 01, 2022Comments
-
Anesh almost 2 years
I have calculated the differences between consecutive values in a series, but I cannot reverse / undifference them using
diffinv()
:ds_sqrt = np.sqrt(ds) ds_sqrt = pd.DataFrame(ds_sqrt) ds_diff = ds_sqrt.diff().values
How can I undifference this?
-
bluesmonk over 2 yearsI'm getting different plots using your code