Mahalonobis distance in R, error: system is computationally singular

14,099

Solution 1

The Mahalanobis distance requires you to calculate the inverse of the covariance matrix. The function mahalanobis internally uses solve which is a numerical way to calculate the inverse. Unfortunately, if some of the numbers used in the inverse calculation are very small, it assumes that they are zero, leading to the assumption that it is a singular matrix. This is why it specifies that they are computationally singular, because the matrix might not be singular given a different tolerance.

The solution is to set the tolerance for when it assumes that they are zero. Fortunately, mahalanobis allows you to pass this parameter (tol) to solve:

mahalanobis(dat,center=centroid,cov=cov(dat),tol=1e-20)
# [1] 24.215494 28.394913  6.984101 28.004975 11.095357 14.401967 ...

Solution 2

mahalanobis uses the covariance matrix, cov, (more precisely the inverse of it) to transform the coordinate system, then compute Euclidian distance in the new coordinates. A standard reference is Duda & Hart "Pattern Classification and Scene Recognition"

Looks like your cov matrix is singular. Perhaps there are linearly-dependent columns in "dat" that are unnecessary? Setting the tolerance to zero won't help if the covariance matrix is truly singular. The first thing to do, instead, is look for columns that might be a rescaling of some other column, or might be just a sum of 2 or more other columns and remove them. Such columns are redundant for the mahalanobis distance.

BTW, since mahalanobis distance is effectively a rescaling and rotation, calling the scaling function looks superfluous - any reason why you want that?

Share:
14,099
Pascal
Author by

Pascal

Updated on June 11, 2022

Comments

  • Pascal
    Pascal almost 2 years

    I'd like to calculate multivariate distance from a set of points to the centroid of those points. Mahalanobis distance seems to be suited for this. However, I get an error (see below).

    Can anyone tell me why I am getting this error, and if there is a way to work around it?

    If you download the coordinate data and the associated environmental data, you can run the following code.

    require(maptools)
    occ <- readShapeSpatial('occurrences.shp')
    load('envDat.Rdata')
    
    #standardize the data to scale the variables
    dat <- as.matrix(scale(dat))
    centroid <- dat[1547,]  #let's assume this is the centroid in this case
    
    #Calculate multivariate distance from all points to centroid
    mahalanobis(dat,center=centroid,cov=cov(dat))
    
    Error in solve.default(cov, ...) : 
      system is computationally singular: reciprocal condition number = 9.50116e-19
    
  • Pascal
    Pascal over 10 years
    I didn't realize that mahalanobis rescales, which is why I was rescaling beforehand. Thanks for pointing that out!
  • Andrea Ianni ௫
    Andrea Ianni ௫ over 8 years
    Is there the possibility to do something similar with "mahalanobis.dist" (package "StatMatch") too?
  • nograpes
    nograpes over 8 years
    I don't know. Also, that question is different enough that you should really make a new one.
  • Chetan Arvind Patil
    Chetan Arvind Patil almost 7 years
    @nograpes - What tol value would you suggest in order to take care of all possible reciprocal condition number?. I am facing same issue, and each time I set a new tol(), I get new reciprocal warning