PCA inverse transform manually

16,712

1) transform is not data * pca.components_.

Firstly, * is not dot product for numpy array. It is element-wise multiplication. To perform dot product, you need to use np.dot.

Secondly, the shape of PCA.components_ is (n_components, n_features) while the shape of data to transform is (n_samples, n_features), so you need to transpose PCA.components_ to perform dot product.

Moreover, the first step of transform is to subtract the mean, therefore if you do it manually, you also need to subtract the mean at first.

The correct way to transform is

data_reduced = np.dot(data - pca.mean_, pca.components_.T)

2) inverse_transform is just the inverse process of transform

data_original = np.dot(data_reduced, pca.components_) + pca.mean_

If your data already has zero mean in each column, you can ignore the pca.mean_ above, for example

import numpy as np
from sklearn.decomposition import PCA

pca = PCA(n_components=3)
pca.fit(data)

data_reduced = np.dot(data, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform
Share:
16,712
Baron Yugovich
Author by

Baron Yugovich

Updated on June 26, 2022

Comments

  • Baron Yugovich
    Baron Yugovich almost 2 years

    I am using scikit-learn. The nature of my application is such that I do the fitting offline, and then can only use the resulting coefficients online(on the fly), to manually calculate various objectives.

    The transform is simple, it is just data * pca.components_, i.e. simple dot product. However, I have no idea how to perform the inverse transform. Which field of the pca object contains the relevant coefficients for the inverse transform? How do I calculate the inverse transform?

    Specifically, I am referring to the PCA.inverse_transform() method call available in the sklearn.decomposition.PCA package: how can I manually reproduce its functionality using various coefficients calculated by the PCA?

  • Baron Yugovich
    Baron Yugovich over 8 years
    When writing * above, I was not writing code, but psueodocde, i.e. writing the idea informally. As for subtracting the mean, that's understood, right, X, the input matrix, should already have each column with mean 0 and stdev 1, i.e. it is already standardized anyway, right? Thus, further tempering with the mean would not be necessary. However, if you're trying to express how to transform the original data, before standardization, can you please write it in a cleaner, more step-by-step process? Then I am ready to accept your answer.
  • yangjie
    yangjie over 8 years
    Yes, if your data already has each column with mean 0, you do not need to temper with the mean. The steps are actually simple, I have provided a more complete example, please point out if you are unclear about any part.
  • Baron Yugovich
    Baron Yugovich over 8 years
    One more question: you address the mean, but how about the variance? You don't mention anything about ensuring that st.dev=1.
  • yangjie
    yangjie over 8 years
    It is not necessary to ensure std=1. The PCA implemented by scikit-learn only centers the data but does not scale it. You can check that by looking at the source code github.com/scikit-learn/scikit-learn/blob/a95203b/sklearn/…
  • yangjie
    yangjie over 8 years
    You can normalize the data as preprocessing, but there is nothing to do with the PCA transform actually. What inverse_transfrom returns is only the preprocessed data.
  • Gulzar
    Gulzar over 4 years
    For doing it manually, and truncating dimensions, data_reduced = np.dot(data, pca.components_.T[:,:dim]), and back data_original = np.dot(data_reduced, pca.components_[:dim, :])