Pearson's Coefficient and Covariance calculation in Matlab
Solution 1
I think you're just confused with covariance and covariance matrix, and the mathematical notation and MATLAB's function inputs do look similar. In math, cov(x,y)
means the covariance of the two variables x
and y
. In MATLAB, cov(x,y)
calculates the covariance matrix of x
and y
. Here cov
is a function and x
and y
are the inputs.
Just to make it clearer, let me denote the covariance by C
. MATLAB's cov(x,y)
returns a matrix of the form
C_xx C_xy
C_yx C_yy
As RichC pointed out, you need the off-diagonals, C_xy
(note that C_xy=C_yx
for real variables x
and y
). A MATLAB script that gives you the Pearson's coefficient for two variables x
and y
, is:
C=cov(x,y);
p=C(2)/(std(x)*std(y));
Solution 2
From the docs:
cov(X,Y), where X and Y are matrices with the same number of elements, is equivalent to cov([X(:) Y(:)]).
use:
C = cov(X,Y);
coeff = C(1,2) / sqrt(C(1,1) * C(2,2))
Ramala
Updated on November 15, 2020Comments
-
Ramala almost 3 years
I want to calculate Pearson's correlation coefficent in Matlab (without using Matlab's
corr
function).Simply, I have two vectors A and B (each of them is 1x100) and I am trying to calculate the Pearson's coefficient like this:
P = cov(x, y)/std(x, 1)std(y,1)
I am using Matlab's
cov
andstd
functions. What I don't get is, the cov function returns me a square matrix like this:corrAB = 0.8000 0.2000 0.2000 4.8000
But I expect a single number as the covariance so I can come up with a single P (pearson's coefficient) number. What is the point I'm missing?
-
Rich C over 12 yearsDo you mean
P = cov(x,y)/sqrt(var(x)*var(y));
? The diagonal should be 1. The off diagonal is what you want. -
Ramala over 12 yearsyou are right, I updated the question. Is the "off diagonal" in above example are 0.2000 and 0.2000? So should I do another calculation with them or just go with 0.2?
-
Rich C over 12 yearsIn you're example, 0.2 is the off diagonal. However, the 0.8 and 4.8 should both be 1. So something is wrong with your calc. Just do corr(x,y) to check. Read the help to understand why it returns a matrix. It was unexpected to me the first time also.
-
Ramala over 12 yearsMy arrays are like: x =[4 5 5 3 5], y = [4 4 0 0 0]. Maybe because of that, there are values like 4.8. I'll read the docs, thanks.
-
abcd over 12 years@RichC: the diagonals need not be 1. The will be 1 only if the variances of both samples are exactly the same.
-
Rich C over 12 years@yoda: you're right. I was thinking P was the correlation matrix, but only the off diagonal elements are correct. The diagonal elements are nonsense.
-
abcd over 12 years@RichC: the diagonal elements are not nonsense... they are the variances of
x
andy
:) -
Rich C over 12 years@yoda: the diagonals of P as defined above are nonsense.
-
abcd over 12 years@RichC: There's some confusion here. The matrix output,
corrAB
that Ramala gave in the question is correct, and the diagonals are the variances. As for the matrixP
that he defined (denominator needs to be enclosed in parenthesis), the diagonals aresigma_x/sigma_y
andsigma_y/sigma_x
respectively. Still not nonsense, as its a direct measure of how much the deviation in one sample is, compared to the other.
-
-
Ramala over 12 yearsIs the "coeff" variable is Pearson coefficient? or you meant covariance? Because in the coefficient formula, I need to divide the covariance by standart deviations of X and Y.