Difference between cosine similarity and cosine distance

12,755

Good question but yes, these are 2 different things but connected by the following equation:

Cosine_distance = 1 - cosine_similarity


Why?

Usually, people use the cosine similarity as a similarity metric between vectors. Now, the distance can be defined as 1-cos_similarity.

The intuition behind this is that if 2 vectors are perfectly the same then similarity is 1 (angle=0) and thus, distance is 0 (1-1=0).

Similarly you can define the cosine distance for the resulting similarity value range.

Cosine similarity range: −1 meaning exactly opposite, 1 meaning exactly the same, 0 indicating orthogonality.


References: Scipy wolfram

From scipy

Share:
12,755

Related videos on Youtube

user1700890
Author by

user1700890

SOreadytohelp

Updated on September 16, 2022

Comments

  • user1700890
    user1700890 over 1 year

    It looks like scipy.spatial.distance.cdist cosine similariy distance:

    link to cos distance 1

    1 - u*v/(||u||||v||)
    

    is different from sklearn.metrics.pairwise.cosine_similarity which is

    link to cos similarity 2

     u*v/||u||||v||
    

    Does anybody know reason for different definitions?

    • Warren Weckesser
      Warren Weckesser over 4 years
      Think of the trivial case: distance(X, X) should be 0, because the distance from X to X is 0. similarity(X, X) should be the maximum of the function that measures similariy (1 in this case), because X and X are as similar as two things can be.
  • user1700890
    user1700890 over 4 years
    Thank you for explanation. Terminology a bit confusing. I feel like cosine distance should be called simply cosine. Cosine similarity distance should be called cosine distance.
  • seralouk
    seralouk over 4 years
    I agree but this is how it is defined in the engineering/math community.
  • user1700890
    user1700890 over 4 years
    Yeah, does not make sense to change it now.
  • Dan
    Dan over 4 years
    @user1700890 see the first bullet point here, for something to be a distance it must satisfy "d(x,y) = 0 if and only if x = y. i.e.is zero precisely from a point to itself". The cosine distance satisfies this, cosine similarity does not. Hence the terminology.
  • user1700890
    user1700890 over 4 years
    @Dan Thank you Dan. Your explanation makes sense. Interesting how cosine_similarity is under sklearn.metrics while not being a metric
  • Dan
    Dan over 4 years
    Take a look at the second sentence in this article, while not strictly a mathematical metric, in stats similarities are colloquially referred to as metrics as they fill similar roles. sklearn's metrics are more like measurements (colloquially).