PCA inverse transform manually
1) transform
is not data * pca.components_
.
Firstly, *
is not dot product for numpy array. It is element-wise multiplication. To perform dot product, you need to use np.dot
.
Secondly, the shape of PCA.components_
is (n_components, n_features) while the shape of data to transform is (n_samples, n_features), so you need to transpose PCA.components_
to perform dot product.
Moreover, the first step of transform is to subtract the mean, therefore if you do it manually, you also need to subtract the mean at first.
The correct way to transform is
data_reduced = np.dot(data - pca.mean_, pca.components_.T)
2) inverse_transform
is just the inverse process of transform
data_original = np.dot(data_reduced, pca.components_) + pca.mean_
If your data already has zero mean in each column, you can ignore the pca.mean_
above, for example
import numpy as np
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
pca.fit(data)
data_reduced = np.dot(data, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform
Baron Yugovich
Updated on June 26, 2022Comments
-
Baron Yugovich almost 2 years
I am using scikit-learn. The nature of my application is such that I do the fitting offline, and then can only use the resulting coefficients online(on the fly), to manually calculate various objectives.
The transform is simple, it is just
data * pca.components_
, i.e. simple dot product. However, I have no idea how to perform the inverse transform. Which field of thepca
object contains the relevant coefficients for the inverse transform? How do I calculate the inverse transform?Specifically, I am referring to the PCA.inverse_transform() method call available in the
sklearn.decomposition.PCA package
: how can I manually reproduce its functionality using various coefficients calculated by the PCA? -
Baron Yugovich over 8 yearsWhen writing * above, I was not writing code, but psueodocde, i.e. writing the idea informally. As for subtracting the mean, that's understood, right, X, the input matrix, should already have each column with mean 0 and stdev 1, i.e. it is already standardized anyway, right? Thus, further tempering with the mean would not be necessary. However, if you're trying to express how to transform the original data, before standardization, can you please write it in a cleaner, more step-by-step process? Then I am ready to accept your answer.
-
yangjie over 8 yearsYes, if your data already has each column with mean 0, you do not need to temper with the mean. The steps are actually simple, I have provided a more complete example, please point out if you are unclear about any part.
-
Baron Yugovich over 8 yearsOne more question: you address the mean, but how about the variance? You don't mention anything about ensuring that st.dev=1.
-
yangjie over 8 yearsIt is not necessary to ensure std=1. The PCA implemented by scikit-learn only centers the data but does not scale it. You can check that by looking at the source code github.com/scikit-learn/scikit-learn/blob/a95203b/sklearn/…
-
yangjie over 8 yearsYou can normalize the data as preprocessing, but there is nothing to do with the PCA transform actually. What
inverse_transfrom
returns is only the preprocessed data. -
Gulzar over 4 yearsFor doing it manually, and truncating dimensions,
data_reduced = np.dot(data, pca.components_.T[:,:dim])
, and backdata_original = np.dot(data_reduced, pca.components_[:dim, :])