Mahalonobis distance in R, error: system is computationally singular
Solution 1
The Mahalanobis distance requires you to calculate the inverse of the covariance matrix. The function mahalanobis
internally uses solve
which is a numerical way to calculate the inverse. Unfortunately, if some of the numbers used in the inverse calculation are very small, it assumes that they are zero, leading to the assumption that it is a singular matrix. This is why it specifies that they are computationally singular, because the matrix might not be singular given a different tolerance.
The solution is to set the tolerance for when it assumes that they are zero. Fortunately, mahalanobis
allows you to pass this parameter (tol
) to solve
:
mahalanobis(dat,center=centroid,cov=cov(dat),tol=1e-20)
# [1] 24.215494 28.394913 6.984101 28.004975 11.095357 14.401967 ...
Solution 2
mahalanobis uses the covariance matrix, cov, (more precisely the inverse of it) to transform the coordinate system, then compute Euclidian distance in the new coordinates. A standard reference is Duda & Hart "Pattern Classification and Scene Recognition"
Looks like your cov matrix is singular. Perhaps there are linearly-dependent columns in "dat" that are unnecessary? Setting the tolerance to zero won't help if the covariance matrix is truly singular. The first thing to do, instead, is look for columns that might be a rescaling of some other column, or might be just a sum of 2 or more other columns and remove them. Such columns are redundant for the mahalanobis distance.
BTW, since mahalanobis distance is effectively a rescaling and rotation, calling the scaling function looks superfluous - any reason why you want that?
Pascal
Updated on June 11, 2022Comments
-
Pascal almost 2 years
I'd like to calculate multivariate distance from a set of points to the centroid of those points. Mahalanobis distance seems to be suited for this. However, I get an error (see below).
Can anyone tell me why I am getting this error, and if there is a way to work around it?
If you download the coordinate data and the associated environmental data, you can run the following code.
require(maptools) occ <- readShapeSpatial('occurrences.shp') load('envDat.Rdata') #standardize the data to scale the variables dat <- as.matrix(scale(dat)) centroid <- dat[1547,] #let's assume this is the centroid in this case #Calculate multivariate distance from all points to centroid mahalanobis(dat,center=centroid,cov=cov(dat)) Error in solve.default(cov, ...) : system is computationally singular: reciprocal condition number = 9.50116e-19
-
Pascal over 10 yearsI didn't realize that mahalanobis rescales, which is why I was rescaling beforehand. Thanks for pointing that out!
-
Andrea Ianni ௫ over 8 yearsIs there the possibility to do something similar with "mahalanobis.dist" (package "StatMatch") too?
-
nograpes over 8 yearsI don't know. Also, that question is different enough that you should really make a new one.
-
Chetan Arvind Patil almost 7 years@nograpes - What
tol
value would you suggest in order to take care of all possiblereciprocal condition number
?. I am facing same issue, and each time I set a newtol()
, I get newreciprocal warning