Pearson's Coefficient and Covariance calculation in Matlab

26,775

Solution 1

I think you're just confused with covariance and covariance matrix, and the mathematical notation and MATLAB's function inputs do look similar. In math, cov(x,y) means the covariance of the two variables x and y. In MATLAB, cov(x,y) calculates the covariance matrix of x and y. Here cov is a function and x and y are the inputs.

Just to make it clearer, let me denote the covariance by C. MATLAB's cov(x,y) returns a matrix of the form

C_xx    C_xy
C_yx    C_yy

As RichC pointed out, you need the off-diagonals, C_xy (note that C_xy=C_yx for real variables x and y). A MATLAB script that gives you the Pearson's coefficient for two variables x and y, is:

C=cov(x,y);
p=C(2)/(std(x)*std(y));

Solution 2

From the docs:

cov(X,Y), where X and Y are matrices with the same number of elements, is equivalent to cov([X(:) Y(:)]).

use:

C = cov(X,Y);
coeff = C(1,2) / sqrt(C(1,1) * C(2,2))
Share:
26,775
Ramala
Author by

Ramala

Updated on November 15, 2020

Comments

  • Ramala
    Ramala almost 3 years

    I want to calculate Pearson's correlation coefficent in Matlab (without using Matlab's corr function).

    Simply, I have two vectors A and B (each of them is 1x100) and I am trying to calculate the Pearson's coefficient like this:

    P = cov(x, y)/std(x, 1)std(y,1)
    

    I am using Matlab's cov and std functions. What I don't get is, the cov function returns me a square matrix like this:

    corrAB =
        0.8000    0.2000
        0.2000    4.8000
    

    But I expect a single number as the covariance so I can come up with a single P (pearson's coefficient) number. What is the point I'm missing?

    • Rich C
      Rich C over 12 years
      Do you mean P = cov(x,y)/sqrt(var(x)*var(y));? The diagonal should be 1. The off diagonal is what you want.
    • Ramala
      Ramala over 12 years
      you are right, I updated the question. Is the "off diagonal" in above example are 0.2000 and 0.2000? So should I do another calculation with them or just go with 0.2?
    • Rich C
      Rich C over 12 years
      In you're example, 0.2 is the off diagonal. However, the 0.8 and 4.8 should both be 1. So something is wrong with your calc. Just do corr(x,y) to check. Read the help to understand why it returns a matrix. It was unexpected to me the first time also.
    • Ramala
      Ramala over 12 years
      My arrays are like: x =[4 5 5 3 5], y = [4 4 0 0 0]. Maybe because of that, there are values like 4.8. I'll read the docs, thanks.
    • abcd
      abcd over 12 years
      @RichC: the diagonals need not be 1. The will be 1 only if the variances of both samples are exactly the same.
    • Rich C
      Rich C over 12 years
      @yoda: you're right. I was thinking P was the correlation matrix, but only the off diagonal elements are correct. The diagonal elements are nonsense.
    • abcd
      abcd over 12 years
      @RichC: the diagonal elements are not nonsense... they are the variances of x and y :)
    • Rich C
      Rich C over 12 years
      @yoda: the diagonals of P as defined above are nonsense.
    • abcd
      abcd over 12 years
      @RichC: There's some confusion here. The matrix output, corrAB that Ramala gave in the question is correct, and the diagonals are the variances. As for the matrix P that he defined (denominator needs to be enclosed in parenthesis), the diagonals are sigma_x/sigma_y and sigma_y/sigma_x respectively. Still not nonsense, as its a direct measure of how much the deviation in one sample is, compared to the other.
  • Ramala
    Ramala over 12 years
    Is the "coeff" variable is Pearson coefficient? or you meant covariance? Because in the coefficient formula, I need to divide the covariance by standart deviations of X and Y.