Jensen-Shannon Divergence

21,885

Solution 1

Note that the scipy entropy call below is the Kullback-Leibler divergence.

See: http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence

#!/usr/bin/env python
from scipy.stats import entropy
from numpy.linalg import norm
import numpy as np

def JSD(P, Q):
    _P = P / norm(P, ord=1)
    _Q = Q / norm(Q, ord=1)
    _M = 0.5 * (_P + _Q)
    return 0.5 * (entropy(_P, _M) + entropy(_Q, _M))

Also note that the test case in the Question looks erred?? The sum of the p distribution does not add to 1.0.

See: http://www.itl.nist.gov/div898/handbook/eda/section3/eda361.htm

Solution 2

Since the Jensen-Shannon distance (distance.jensenshannon) has been included in Scipy 1.2, the Jensen-Shannon divergence can be obtained as the square of the Jensen-Shannon distance:

from scipy.spatial import distance

distance.jensenshannon([1.0/10, 9.0/10, 0], [0, 1.0/10, 9.0/10]) ** 2
# 0.5306056938642212

Solution 3

Get some data for distributions with known divergence and compare your results against those known values.

BTW: the sum in KL_divergence may be rewritten using the zip built-in function like this:

sum(_p * log(_p / _q) for _p, _q in zip(p, q) if _p != 0)

This does away with lots of "noise" and is also much more "pythonic". The double comparison with 0.0 and 0 is not necessary.

Solution 4

A general version, for n probability distributions, in python

import numpy as np
from scipy.stats import entropy as H


def JSD(prob_distributions, weights, logbase=2):
    # left term: entropy of misture
    wprobs = weights * prob_distributions
    mixture = wprobs.sum(axis=0)
    entropy_of_mixture = H(mixture, base=logbase)

    # right term: sum of entropies
    entropies = np.array([H(P_i, base=logbase) for P_i in prob_distributions])
    wentropies = weights * entropies
    sum_of_entropies = wentropies.sum()

    divergence = entropy_of_mixture - sum_of_entropies
    return(divergence)

# From the original example with three distributions:
P_1 = np.array([1/2, 1/2, 0])
P_2 = np.array([0, 1/10, 9/10])
P_3 = np.array([1/3, 1/3, 1/3])

prob_distributions = np.array([P_1, P_2, P_3])
n = len(prob_distributions)
weights = np.empty(n)
weights.fill(1/n)

print(JSD(prob_distributions, weights))
#0.546621319446

Solution 5

Explicitly following the math in the Wikipedia article:

def jsdiv(P, Q):
    """Compute the Jensen-Shannon divergence between two probability distributions.

    Input
    -----
    P, Q : array-like
        Probability distributions of equal length that sum to 1
    """

    def _kldiv(A, B):
        return np.sum([v for v in A * np.log2(A/B) if not np.isnan(v)])

    P = np.array(P)
    Q = np.array(Q)

    M = 0.5 * (P + Q)

    return 0.5 * (_kldiv(P, M) +_kldiv(Q, M))
Share:
21,885
Martyn
Author by

Martyn

I'm currently doing a PhD in Computer Science and Information Systems focusing on search engines and language modeling. Aside from my research I'm interested in browser-based game development. You can find some of my work here: https://github.com/lazyeels My language of choice is Python.

Updated on July 14, 2022

Comments

  • Martyn
    Martyn almost 2 years

    I have another question that I was hoping someone could help me with.

    I'm using the Jensen-Shannon-Divergence to measure the similarity between two probability distributions. The similarity scores appear to be correct in the sense that they fall between 1 and 0 given that one uses the base 2 logarithm, with 0 meaning that the distributions are equal.

    However, I'm not sure whether there is in fact an error somewhere and was wondering whether someone might be able to say 'yes it's correct' or 'no, you did something wrong'.

    Here is the code:

    from numpy import zeros, array
    from math import sqrt, log
    
    
    class JSD(object):
        def __init__(self):
            self.log2 = log(2)
    
    
        def KL_divergence(self, p, q):
            """ Compute KL divergence of two vectors, K(p || q)."""
            return sum(p[x] * log((p[x]) / (q[x])) for x in range(len(p)) if p[x] != 0.0 or p[x] != 0)
    
        def Jensen_Shannon_divergence(self, p, q):
            """ Returns the Jensen-Shannon divergence. """
            self.JSD = 0.0
            weight = 0.5
            average = zeros(len(p)) #Average
            for x in range(len(p)):
                average[x] = weight * p[x] + (1 - weight) * q[x]
                self.JSD = (weight * self.KL_divergence(array(p), average)) + ((1 - weight) * self.KL_divergence(array(q), average))
            return 1-(self.JSD/sqrt(2 * self.log2))
    
    if __name__ == '__main__':
        J = JSD()
        p = [1.0/10, 9.0/10, 0]
        q = [0, 1.0/10, 9.0/10]
        print J.Jensen_Shannon_divergence(p, q)
    

    The problem is that I feel that the scores are not high enough when comparing two text documents, for instance. However, this is purely a subjective feeling.

    Any help is, as always, appreciated.

  • Tur1ng
    Tur1ng almost 9 years
    Importing and using norm is not needed, as entropy will normalize the distributions if they do not add up to 1 (see docs.scipy.org/doc/scipy-dev/reference/generated/…). However, to compute _M like that, _P and _Q need to be numpy.ndarray objects.
  • Doug Shore
    Doug Shore almost 9 years
    @Tur1ng note that norm is needed because the calculation of _M requires that _P and _Q are probability distributions (already normalized). Also note that lists are coerced as numpy arrays so this is fine: [2, 4] / np.array([1, 2])
  • emem
    emem over 6 years
    @DougShore actually, since scipy.stats.entropy normalizes the distributions, you don't need to normalize _P and _Q to compute _M, you only need them to sum up to the same value, and you can probably save a few computations. However, this is much more readable like this. On the other hand, I would prefer functions that don't do unnecessary computations, and assume that the input are normalized probabilities.
  • just_learning
    just_learning about 2 years
    So, in the @Doug Shore's code do I need to have the P, Q frequency lists (list_a and list_b) in my occasion: list_a = [1, 100, 40, 1200, 0, 4] and list_b = [23, 5600, 11, 0, 40, 340] as un-normalized as you see it above? Or should I normalize them before feed them in the JSD(P, Q) function?
  • Doug Shore
    Doug Shore about 2 years
    @just_learning the JSD function normalizes the inputs (as probability distributions), so yes JSD(list_a, list_b) will work