create cosine similarity matrix numpy

15,477

Solution 1

let m be the array

m = np.array([
        [ 0.0072427 ,  0.00669255,  0.00785213,  0.00845336,  0.01042869],
        [ 0.00710799,  0.00668831,  0.00772334,  0.00777796,  0.01049965],
        [ 0.00741872,  0.00650899,  0.00772273,  0.00729002,  0.00919407],
        [ 0.00717589,  0.00627021,  0.0069514 ,  0.0079332 ,  0.01069545],
        [ 0.00617369,  0.00590539,  0.00738468,  0.00761699,  0.00886915]
    ])

per wikipedia: Cosine_Similarity
enter image description here

We can calculate our numerator with

d = m.T @ m

Our ‖A‖ is

norm = (m * m).sum(0, keepdims=True) ** .5

Then the similarities are

d / norm / norm.T

[[ 1.      0.9994  0.9979  0.9973  0.9977]
 [ 0.9994  1.      0.9993  0.9985  0.9981]
 [ 0.9979  0.9993  1.      0.998   0.9958]
 [ 0.9973  0.9985  0.998   1.      0.9985]
 [ 0.9977  0.9981  0.9958  0.9985  1.    ]]

The distances are

1 - d / norm / norm.T

[[ 0.      0.0006  0.0021  0.0027  0.0023]
 [ 0.0006  0.      0.0007  0.0015  0.0019]
 [ 0.0021  0.0007  0.      0.002   0.0042]
 [ 0.0027  0.0015  0.002   0.      0.0015]
 [ 0.0023  0.0019  0.0042  0.0015  0.    ]]

Solution 2

Let x be your array

from scipy.spatial.distance import cosine

m, n = x.shape
distances = np.zeros((m,n))
for i in range(m):
    for j in range(n):
        distances[i,j] = cosine(x[i,:],x[:,j])
Share:
15,477

Related videos on Youtube

Sal
Author by

Sal

Updated on October 14, 2022

Comments

  • Sal
    Sal over 1 year

    Suppose I have a numpy matrix like the following:

    array([array([ 0.0072427 ,  0.00669255,  0.00785213,  0.00845336,  0.01042869]),
       array([ 0.00710799,  0.00668831,  0.00772334,  0.00777796,  0.01049965]),
       array([ 0.00741872,  0.00650899,  0.00772273,  0.00729002,  0.00919407]),
       array([ 0.00717589,  0.00627021,  0.0069514 ,  0.0079332 ,  0.01069545]),
       array([ 0.00617369,  0.00590539,  0.00738468,  0.00761699,  0.00886915])], dtype=object)
    

    How can I generate a 5 x 5 matrix where each index of the matrix is the cosine similarity of two corresponding rows in my original matrix?

    e.g. row 0 column 2's value would be the cosine similarity between row 1 and row 3 in the original matrix.

    Here's what I've tried:

    from sklearn.metrics import pairwise_distances
    from scipy.spatial.distance import cosine
    import numpy as np
    
    #features is a column in my artist_meta data frame
    #where each value is a numpy array of 5 floating point values, similar to the
    #form of the matrix referenced above but larger in volume
    
    items_mat = np.array(artist_meta['features'].values)
    
    dist_out = 1-pairwise_distances(items_mat, metric="cosine")
    

    The above code gives me the following error:

    ValueError: setting an array element with a sequence.

    Not sure why I'm getting this because each array is of the same length (5), which I've verified.

    • Sal
      Sal over 7 years
      Sure - the matrix in the original post has been updated to reflect the first five rows of the one I am computing. Even on computing the cosine similarity of the first five rows I run into the error.
    • DYZ
      DYZ over 7 years
      So, as I said before, assuming that f is your matrix, 1-pairwise_distances(f,metric="cosine") gives no errors whatsoever.
  • Ismael EL ATIFI
    Ismael EL ATIFI almost 5 years
    To optimize your code, you can divide m by norm once before doing m.T @ m. It saves the division by norm.T.
  • Catbuilts
    Catbuilts about 3 years
    I agree with @IsmaelELATIFI. The optimized code is: norm = (m * m).sum(0, keepdims=True) ** .5; m_norm = m/norm; similarity_matrix = m_norm.T @ m_norm
  • Ivan Gonzalez
    Ivan Gonzalez about 2 years
    Just to add ^^. When you have unit vectors, the cosine distance is the same as just the dot product.
  • kmf
    kmf about 2 years
    Shouldn't it be m @ m.T ? If I have an M x N matrix, so M vectors each of it N-dimensional, I want to have an M x M distance matrix. [M x N] @ [N x M] = [M x M].
  • Nguai al
    Nguai al about 2 years
    per formulae, d / (norm * norm.T) ?