Principal Component Analysis in MATLAB

20,995

Here's how I would do it:

function [V newX D] = myPCA(X)
    X = bsxfun(@minus, X, mean(X,1));           %# zero-center
    C = (X'*X)./(size(X,1)-1);                  %'# cov(X)

    [V D] = eig(C);
    [D order] = sort(diag(D), 'descend');       %# sort cols high to low
    V = V(:,order);

    newX = X*V(:,1:end);
end

and an example to compare against the PRINCOMP function from the Statistics Toolbox:

load fisheriris

[V newX D] = myPCA(meas);
[PC newData Var] = princomp(meas);

You might also be interested in this related post about performing PCA by SVD.

Share:
20,995
matcheek
Author by

matcheek

Updated on August 01, 2020

Comments

  • matcheek
    matcheek almost 4 years

    I'm implementing PCA using eigenvalue decomposition for sparse data. I know matlab has PCA implemented, but it helps me understand all the technicalities when I write code. I've been following the guidance from here, but I'm getting different results in comparison to built-in function princomp.

    Could anybody look at it and point me in the right direction.

    Here's the code:

    function [mu, Ev, Val ] = pca(data)
    
    % mu - mean image
    % Ev - matrix whose columns are the eigenvectors corresponding to the eigen
    % values Val 
    % Val - eigenvalues
    
    if nargin ~= 1
     error ('usage: [mu,E,Values] = pca_q1(data)');
    end
    
    mu = mean(data)';
    
    nimages = size(data,2);
    
    for i = 1:nimages
     data(:,i) = data(:,i)-mu(i);
    end
    
    L = data'*data;
    [Ev, Vals]  = eig(L);    
    [Ev,Vals] = sort(Ev,Vals);
    
    % computing eigenvector of the real covariance matrix
    Ev = data * Ev;
    
    Val = diag(Vals);
    Vals = Vals / (nimages - 1);
    
    % normalize Ev to unit length
    proper = 0;
    for i = 1:nimages
     Ev(:,i) = Ev(:,1)/norm(Ev(:,i));
     if Vals(i) < 0.00001
      Ev(:,i) = zeros(size(Ev,1),1);
     else
      proper = proper+1;
     end;
    end;
    
    Ev = Ev(:,1:nimages);
    
  • Admin
    Admin almost 10 years
    I want to ask something, Is princomp sorts the data of COEFF by latent by default (ref: mathworks.com/help/stats/princomp.html)? What is the difference between your function and princomp
  • Admin
    Admin almost 10 years
    I want to use coeff and latent where coeff is sorted with latents. May I use the built-in function princomp or your myPCA ??
  • Amro
    Amro almost 10 years
    @AhsanAli: obviously as the example above shows, both functions produce same output (up to a certain precision); the columns of COEFF (principal components) are sorted in descending order in terms of component variance LATENT. Also check the last link mentioned above about performing PCA using SVD instead of EIG.. Note that princomp is being replaced with pca function in recent editions (in fact check the source code to see that calls to princomp are being routed to pca internally).
  • Admin
    Admin almost 10 years
    OKay, what is the difference b/w pca & princomp and also in your function myPCA & PCA by SVD ? I'm unable to differentiate b/w them? What my problem is, I want to compute pca of n matrices of [500x3] where coeff is sorted with latents.
  • Amro
    Amro almost 10 years
    @AhsanAli: just use the pca function, they will come out sorted as described. Like I said before, if you look at the source code (edit princomp.m), it is simply forwarding the call to pca()...
  • Admin
    Admin almost 10 years
    OK I GOT IT. @Amro +1
  • Admin
    Admin almost 10 years
    I think there is no difference b/w pca & princomp . M i right ? and what I checked, pca == princomp but ~= myPCA. Y . ?
  • Amro
    Amro almost 10 years
    @AhsanAli: what you have to understand is that eigenvectors are defined up to a constant, meaning that if EV is an eigenvector, then so is lambda*EV for any constant lambda. Now since EIG function always returns normalized unit vectors, we are left with an arbitrary sign (lambda=-1 or lambda=+1). See this post for more details: stackoverflow.com/a/18152804/97160 (also checkout the links mentioned in the comments)
  • Amro
    Amro almost 10 years
    @AhsanAli: PCA after all is an orthogonal transformation that transforms the data to a new coordinates system (such that the data has maximum variance along the new directions in decreasing order). The principal components (columns of COEFF matrix) are the vectors that describe the directions of this new system. Now if (I,J,K) is a basis set for the vector space then so is (a*I,b*J,c*K), with a correspondingly changed data coordinates (SCORE matrix). So eigenvectors are not unique (can be independently scaled/multiplied by a constant) as long as they span the same subspace.
  • Admin
    Admin almost 10 years
    I want a suggestion, as U know better HMMs and statistics, I want to train my data with HMM but the Gaussians of my data are too close as I get mcr after training 0.89, how can I extract features of my 3-D trajectories of length [501x3] or apply some statistical analysis method on that. I applied PCA with respective angles but it gives mcr of 0.88.
  • Admin
    Admin almost 10 years