Distance matrix to pairwise distance list in R
Solution 1
A couple of other options:
Generate some data
D <- dist(cbind(runif(4), runif(4)), diag=TRUE, upper=TRUE) # generate dummy data m <- as.matrix(D) # coerce dist object to a matrix dimnames(m) <- dimnames(m) <- list(LETTERS[1:4], LETTERS[1:4])
Assuming you just want the distances for pairs defined by the upper triangle of the distance matrix, you can do:
xy <- t(combn(colnames(m), 2)) data.frame(xy, dist=m[xy]) # X1 X2 dist # 1 A B 0.3157942 # 2 A C 0.5022090 # 3 A D 0.3139995 # 4 B C 0.1865181 # 5 B D 0.6297772 # 6 C D 0.8162084
Alternatively, if you want distances for all pairs (in both directions):
data.frame(col=colnames(m)[col(m)], row=rownames(m)[row(m)], dist=c(m)) # col row dist # 1 A A 0.0000000 # 2 A B 0.3157942 # 3 A C 0.5022090 # 4 A D 0.3139995 # 5 B A 0.3157942 # 6 B B 0.0000000 # 7 B C 0.1865181 # 8 B D 0.6297772 # 9 C A 0.5022090 # 10 C B 0.1865181 # 11 C C 0.0000000 # 12 C D 0.8162084 # 13 D A 0.3139995 # 14 D B 0.6297772 # 15 D C 0.8162084 # 16 D D 0.0000000
or the following, which excludes any
NA
distances, but doesn't keep the column/row names (though this would be easy to rectify since we have the column/row indices):data.frame(which(!is.na(m), arr.ind=TRUE, useNames=FALSE), dist=c(m))
Solution 2
If you have a data.frame
you could do something like:
df <- structure(list(A1 = c(0, 0, 0, 0), B1 = c(0.85, 0, 0, 0), C1 = c(0.45,
0.85, 0, 0), D1 = c(0.96, 0.56, 0.45, 0)), .Names = c("A1", "B1",
"C1", "D1"), row.names = c(NA, -4L), class = "data.frame")
data.frame( t(combn(names(df),2)), dist=t(df)[lower.tri(df)] )
X1 X2 dist
1 A1 B1 0.85
2 A1 C1 0.45
3 A1 D1 0.96
4 B1 C1 0.85
5 B1 D1 0.56
6 C1 D1 0.45
Another approach if you have it as a matrix
with row+col-names is to use reshape2
directly:
mat <- structure(c(0, 0, 0, 0, 0.85, 0, 0, 0, 0.45, 0.85, 0, 0, 0.96,
0.56, 0.45, 0), .Dim = c(4L, 4L), .Dimnames = list(c("A1", "B1",
"C1", "D1"), c("A1", "B1", "C1", "D1")))
library(reshape2)
subset(melt(mat), value!=0)
Var1 Var2 value
5 A1 B1 0.85
9 A1 C1 0.45
10 B1 C1 0.85
13 A1 D1 0.96
14 B1 D1 0.56
15 C1 D1 0.45
Solution 3
I suppose you have a contingency table or a matrix defined as follow:
mat = matrix(c(0, 0.85, 0.45, 0.96, NA, 0, 0.85, 0.56, NA, NA, 0, 0.45, NA,NA,NA,0), ncol=4)
cont = as.table(t(mat))
# A B C D
#A 0.00 0.85 0.45 0.96
#B 0.00 0.85 0.56
#C 0.00 0.45
#D 0.00
Then you simply need a data.frame conversion, and remove NA/0's:
df = as.data.frame(cont)
df = df[complete.cases(df),]
df[df[,3]!=0,]
# Var1 Var2 Freq
#5 A B 0.85
#9 A C 0.45
#10 B C 0.85
#13 A D 0.96
#14 B D 0.56
#15 C D 0.45
Anurag Mishra
Updated on June 09, 2022Comments
-
Anurag Mishra almost 2 years
Is there any R package to obtain a pairwise distance list if my input file is a distance matrix For eg, if my input is a data.frame like this:
A1 B1 C1 D1 A1 0 0.85 0.45 0.96 B1 0 0.85 0.56 C1 0 0.45 D1 0
I want the output as:
A1 B1 0.85 A1 C1 0.45 A1 D1 0.96 B1 C1 0.85 B1 D1 0.56 C1 D1 0.45
I found a question to do the opposite function using package 'reshape' but could not tweak it to get what I wanted.
-
Anurag Mishra over 9 yearsI get the following error msg. Any idea why ? Error in m[xy] : subscript out of bounds
-
jbaums over 9 years@AnuragMishra When you run my code? Or when you apply it to your data?
-
Anurag Mishra over 9 yearsWhen I apply it to my data, which is a dataframe.
-
jbaums over 9 years@AnuragMishra Please edit your question and add the output of
dput(d)
, whered
is your dataframe. Ifd
is too large to include in this way, then provide a small subset of it for us to work with. -
Anurag Mishra over 9 yearsI am using two columns from a data frame as the X and Y coordinates to find distances. dput() gives me the following Size = 121L, Diag = TRUE, Upper = TRUE, method = "euclidean", call = dist(x = cbind(x$da1, x$da2), diag = TRUE, upper = TRUE), class = "dist") x$da1 and x$da2 are my two columns from the data frame 'x' Is this what you wanted ?