Cosine similarity between each row in a Dataframe in Python

35,584

You can directly just use sklearn.metrics.pairwise.cosine_similarity.

Demo

import numpy as np; import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

df = pd.DataFrame(np.random.randint(0, 2, (3, 5)))

df
##     0  1  2  3  4
##  0  1  1  1  0  0
##  1  0  0  1  1  1
##  2  0  1  0  1  0

cosine_similarity(df)
##  array([[ 1.        ,  0.33333333,  0.40824829],
##         [ 0.33333333,  1.        ,  0.40824829],
##         [ 0.40824829,  0.40824829,  1.        ]])
Share:
35,584
Jayanth Prakash Kulkarni
Author by

Jayanth Prakash Kulkarni

Project Assistant at IISc. Undergrad, MSRIT. Interested in Machine Learning, Reinforcement Learning and Game theory.

Updated on July 09, 2022

Comments

  • Jayanth Prakash Kulkarni
    Jayanth Prakash Kulkarni almost 2 years

    I have a DataFrame containing multiple vectors each having 3 entries. Each row is a vector in my representation. I needed to calculate the cosine similarity between each of these vectors. Converting this to a matrix representation is better or is there a cleaner approach in DataFrame itself?

    Here is the code that I have tried.

    import pandas as pd
    from scipy import spatial
    df = pd.DataFrame([X,Y,Z]).T
    similarities = df.values.tolist()
    
    for x in similarities:
        for y in similarities:
            result = 1 - spatial.distance.cosine(x, y)