Principal Component Analysis in MATLAB
20,995
Here's how I would do it:
function [V newX D] = myPCA(X)
X = bsxfun(@minus, X, mean(X,1)); %# zero-center
C = (X'*X)./(size(X,1)-1); %'# cov(X)
[V D] = eig(C);
[D order] = sort(diag(D), 'descend'); %# sort cols high to low
V = V(:,order);
newX = X*V(:,1:end);
end
and an example to compare against the PRINCOMP function from the Statistics Toolbox:
load fisheriris
[V newX D] = myPCA(meas);
[PC newData Var] = princomp(meas);
You might also be interested in this related post about performing PCA by SVD.
Author by
matcheek
Updated on August 01, 2020Comments
-
matcheek almost 4 years
I'm implementing PCA using eigenvalue decomposition for sparse data. I know matlab has PCA implemented, but it helps me understand all the technicalities when I write code. I've been following the guidance from here, but I'm getting different results in comparison to built-in function princomp.
Could anybody look at it and point me in the right direction.
Here's the code:
function [mu, Ev, Val ] = pca(data) % mu - mean image % Ev - matrix whose columns are the eigenvectors corresponding to the eigen % values Val % Val - eigenvalues if nargin ~= 1 error ('usage: [mu,E,Values] = pca_q1(data)'); end mu = mean(data)'; nimages = size(data,2); for i = 1:nimages data(:,i) = data(:,i)-mu(i); end L = data'*data; [Ev, Vals] = eig(L); [Ev,Vals] = sort(Ev,Vals); % computing eigenvector of the real covariance matrix Ev = data * Ev; Val = diag(Vals); Vals = Vals / (nimages - 1); % normalize Ev to unit length proper = 0; for i = 1:nimages Ev(:,i) = Ev(:,1)/norm(Ev(:,i)); if Vals(i) < 0.00001 Ev(:,i) = zeros(size(Ev,1),1); else proper = proper+1; end; end; Ev = Ev(:,1:nimages);
-
Admin almost 10 yearsI want to ask something, Is
princomp
sorts the data ofCOEFF
bylatent
by default (ref: mathworks.com/help/stats/princomp.html)? What is the difference between your function andprincomp
-
Admin almost 10 yearsI want to use
coeff
andlatent
wherecoeff
is sorted withlatents
. May I use the built-in functionprincomp
or yourmyPCA
?? -
Amro almost 10 years@AhsanAli: obviously as the example above shows, both functions produce same output (up to a certain precision); the columns of
COEFF
(principal components) are sorted in descending order in terms of component varianceLATENT
. Also check the last link mentioned above about performing PCA using SVD instead of EIG.. Note thatprincomp
is being replaced withpca
function in recent editions (in fact check the source code to see that calls toprincomp
are being routed topca
internally). -
Admin almost 10 yearsOKay, what is the difference b/w
pca
&princomp
and also in your functionmyPCA
&PCA by SVD
? I'm unable to differentiate b/w them? What my problem is, I want to computepca
ofn
matrices of[500x3]
wherecoeff
is sorted withlatents
. -
Amro almost 10 years@AhsanAli: just use the
pca
function, they will come out sorted as described. Like I said before, if you look at the source code (edit princomp.m
), it is simply forwarding the call topca()
... -
Admin almost 10 yearsOK I GOT IT. @Amro +1
-
Admin almost 10 yearsI think there is no difference b/w
pca
&princomp
. M i right ? and what I checked,pca
==princomp
but ~=myPCA
. Y . ? -
Amro almost 10 years@AhsanAli: what you have to understand is that eigenvectors are defined up to a constant, meaning that if
EV
is an eigenvector, then so islambda*EV
for any constantlambda
. Now sinceEIG
function always returns normalized unit vectors, we are left with an arbitrary sign (lambda=-1
orlambda=+1
). See this post for more details: stackoverflow.com/a/18152804/97160 (also checkout the links mentioned in the comments) -
Amro almost 10 years@AhsanAli: PCA after all is an orthogonal transformation that transforms the data to a new coordinates system (such that the data has maximum variance along the new directions in decreasing order). The principal components (columns of
COEFF
matrix) are the vectors that describe the directions of this new system. Now if(I,J,K)
is a basis set for the vector space then so is(a*I,b*J,c*K)
, with a correspondingly changed data coordinates (SCORE
matrix). So eigenvectors are not unique (can be independently scaled/multiplied by a constant) as long as they span the same subspace. -
Admin almost 10 yearsI want a suggestion, as U know better
HMMs
andstatistics
, I want to train my data withHMM
but the Gaussians of my data are too close as I getmcr
after training0.89
, how can I extract features of my 3-D trajectories of length[501x3]
or apply some statistical analysis method on that. I appliedPCA
with respective angles but it gives mcr of0.88
. -
Admin almost 10 yearsLet us continue this discussion in chat.