hierarchical clustering on correlations in Python scipy/numpy?
12,353
Just change the metric to correlation
so that the first line becomes:
Y=pdist(X, 'correlation')
However, I believe that the code can be simplified to just:
Z=linkage(X, 'single', 'correlation')
dendrogram(Z, color_threshold=0)
because linkage will take care of the pdist for you.
Author by
Admin
Updated on June 05, 2022Comments
-
Admin almost 2 years
How can I run hierarchical clustering on a correlation matrix in
scipy
/numpy
? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically cluster by correlations of each entry across the 9 conditions. I'd like to use 1-pearson correlation as the distances for clustering. Assuming I have anumpy
arrayX
that contains the 100 x 9 matrix, how can I do this?I tried using hcluster, based on this example:
Y=pdist(X, 'seuclidean') Z=linkage(Y, 'single') dendrogram(Z, color_threshold=0)
However,
pdist
is not what I want, since that's a euclidean distance. Any ideas?thanks.
-
Admin almost 14 yearsDoes 'correlation' here mean Pearson or Spearman? Also, shouldn't it be 1 - pearson in order to be a valid distance metric that can be used for pdist? Does pdist do that automatically? thanks.
-
Justin Peel almost 14 yearsIt looks like it is 1 - pearson to me. You can look at it yourself in site-packages/scipy/spatial/distance.py
-
dwf almost 14 yearsIt's fairly rare for "correlation" mentioned alone to mean Spearman correlation. Usually if it's Spearman people will say so, otherwise assume Pearson.