multivariate student t-distribution with python

12,401

Solution 1

I coded the density by myself:

import numpy as np
from math import *

def multivariate_t_distribution(x,mu,Sigma,df,d):
    '''
    Multivariate t-student density:
    output:
        the density of the given element
    input:
        x = parameter (d dimensional numpy array or scalar)
        mu = mean (d dimensional numpy array or scalar)
        Sigma = scale matrix (dxd numpy array)
        df = degrees of freedom
        d: dimension
    '''
    Num = gamma(1. * (d+df)/2)
    Denom = ( gamma(1.*df/2) * pow(df*pi,1.*d/2) * pow(np.linalg.det(Sigma),1./2) * pow(1 + (1./df)*np.dot(np.dot((x - mu),np.linalg.inv(Sigma)), (x - mu)),1.* (d+df)/2))
    d = 1. * Num / Denom 
    return d

Solution 2

This evaluates the log pdf of the multivariate student-T distribution for n by d data matrix X:

from scipy.special import gamma
from numpy.linalg import slogdet

def multivariate_student_t(X, mu, Sigma, df):    
    #multivariate student T distribution

    [n,d] = X.shape
    Xm = X-mu
    V = df * Sigma
    V_inv = np.linalg.inv(V)
    (sign, logdet) = slogdet(np.pi * V)

    logz = -gamma(df/2.0 + d/2.0) + gamma(df/2.0) + 0.5*logdet
    logp = -0.5*(df+d)*np.log(1+ np.sum(np.dot(Xm,V_inv)*Xm,axis=1))

    logp = logp - logz            

    return logp

Solution 3

I generalized @farhawa's code to allow for multiple entries in x (i found that i wanted to query multiple points at once).

import numpy as np
from math import gamma

def multivariate_t_distribution(x, mu, Sigma, df):
    '''
    Multivariate t-student density. Returns the density
    of the function at points specified by x.

    input:
        x = parameter (n-d numpy array; will be forced to 2d)
        mu = mean (d dimensional numpy array)
        Sigma = scale matrix (dxd numpy array)
        df = degrees of freedom

    Edited from: http://stackoverflow.com/a/29804411/3521179
    '''

    x = np.atleast_2d(x) # requires x as 2d
    nD = Sigma.shape[0] # dimensionality

    numerator = gamma(1.0 * (nD + df) / 2.0)

    denominator = (
            gamma(1.0 * df / 2.0) * 
            np.power(df * np.pi, 1.0 * nD / 2.0) *  
            np.power(np.linalg.det(Sigma), 1.0 / 2.0) * 
            np.power(
                1.0 + (1.0 / df) *
                np.diagonal(
                    np.dot( np.dot(x - mu, np.linalg.inv(Sigma)), (x - mu).T)
                ), 
                1.0 * (nD + df) / 2.0
                )
            )

    return 1.0 * numerator / denominator 
Share:
12,401
farhawa
Author by

farhawa

I am a junior python developer. I am using python for machine-learning issues

Updated on June 09, 2022

Comments

  • farhawa
    farhawa almost 2 years

    To generate samples with multivariate t-distribution I use this function:

    def multivariatet(mu,Sigma,N,M):
        '''
        Output:
        Produce M samples of d-dimensional multivariate t distribution
        Input:
        mu = mean (d dimensional numpy array or scalar)
        Sigma = scale matrix (dxd numpy array)
        N = degrees of freedom
        M = # of samples to produce
        '''
        d = len(Sigma)
        g = np.tile(np.random.gamma(N/2.,2./N,M),(d,1)).T
        Z = np.random.multivariate_normal(np.zeros(d),Sigma,M)
        return mu + Z/np.sqrt(g)
    

    but what I am looking for now is the multivariate student t-distribution it self so I can calculate the density of elements where dimension > 1.

    That will be something like stats.t.pdf(x, df, loc, scale) of the package scipy but in multi-dimensional space.

  • koshy george
    koshy george over 8 years
    I notice that Sigma need be at-least of dimension 2x2, else np.linalg will barf.
  • 317070
    317070 about 7 years
    There seems to be something off. I get log_pdf's which are off by a factor of about df.
  • develarist
    develarist over 3 years
    Why does the exact same code appear in the pycopula package? and what is the meaning of x, all it says is "parameter" which I don't think it is. github.com/blent-ai/pycopula/blob/master/pycopula/…
  • develarist
    develarist over 3 years
    what is the meaning of x, all it says is "parameter" which I don't think it is
  • farhawa
    farhawa over 3 years
    I have no explanation for why the code appear elsewhere but I know that this answer was written back to 2015 and the github code was pushed in 2018.. That has been said, X here is the value on which you want to apply the t distribution