How to fit a double Gaussian distribution in Python?

13,779

You can't use scikit-learn for this, because the you are not dealing with a set of samples whose distribution you want to estimate. You could of course transform your curve to a PDF, sample it and then try to fit it using a Gaussian mixture model, but that seems to be a bit of an overkill to me.

Here's a solution using simple least square curve fitting. To get it to work I had to remove the background, i.e. ignore all data points with y < 5, and also provide a good starting vector for leastsq, which can be estimated form a plot of the data.

Finding the Starting Vector

The parameter vector that that is found by the least squares method is the vector

params = [c1, mu1, sigma1, c2, mu2, sigma2]

Here, c1 and c2 are scaling factors for the two Gaussians, i.e. their height, mu1and mu2 are the means, i.e. the horizontal positions of the peaks and sigma1 and sigma2 the standard deviations that determine the width of the Gaussians. To find a starting vector I just looked at a plot of the data and estimated the height of the two peaks ( = c1, c2, respectively) and their horizontal position (= mu1, mu1, respectively). sigma1 and sigma2 were simply set to 1.0.

Code

from sklearn import mixture
import matplotlib.pyplot
import matplotlib.mlab
import numpy as np
from pylab import *
from scipy.optimize import leastsq

data = np.genfromtxt('gaussian_fit.dat', skiprows = 1)
x = data[:, 0]
y = data[:, 1]

def double_gaussian( x, params ):
    (c1, mu1, sigma1, c2, mu2, sigma2) = params
    res =   c1 * np.exp( - (x - mu1)**2.0 / (2.0 * sigma1**2.0) ) \
          + c2 * np.exp( - (x - mu2)**2.0 / (2.0 * sigma2**2.0) )
    return res

def double_gaussian_fit( params ):
    fit = double_gaussian( x, params )
    return (fit - y_proc)

# Remove background.
y_proc = np.copy(y)
y_proc[y_proc < 5] = 0.0

# Least squares fit. Starting values found by inspection.
fit = leastsq( double_gaussian_fit, [13.0,-13.0,1.0,60.0,3.0,1.0] )
plot( x, y, c='b' )
plot( x, double_gaussian( x, fit[0] ), c='r' )
Share:
13,779
Tom Kurushingal
Author by

Tom Kurushingal

Updated on June 15, 2022

Comments

  • Tom Kurushingal
    Tom Kurushingal almost 2 years

    I am trying to obtain a double Gaussian distribution for data (link) using Python. The raw data is of the form:

    enter image description here

    For the given data, I would like to obtain two Gaussian profiles for the peaks seen in figure. I tried it with the following code (source):

    from sklearn import mixture
    import matplotlib.pyplot
    import matplotlib.mlab
    import numpy as np
    from pylab import *
    data = np.genfromtxt('gaussian_fit.dat', skiprows = 1)
    x = data[:, 0]
    y = data[:, 1]
    clf = mixture.GMM(n_components=2, covariance_type='full')
    clf.fit((y, x))
    m1, m2 = clf.means_
    w1, w2 = clf.weights_
    c1, c2 = clf.covars_
    fig = plt.figure(figsize = (5, 5))
    plt.subplot(111)
    plotgauss1 = lambda x: plot(x,w1*matplotlib.mlab.normpdf(x,m1,np.sqrt(c1))[0], linewidth=3)
    plotgauss2 = lambda x: plot(x,w2*matplotlib.mlab.normpdf(x,m2,np.sqrt(c2))[0], linewidth=3)
    fig.savefig('gaussian_fit.pdf')
    

    But I am not able to get the desired output. So, how can a double Gaussian distribution be obtained in Python?

    Update

    I was able to fit a single Gaussian distribution with the following code:

    import pylab as plb
    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    from scipy import asarray as ar,exp
    import numpy as np
    
    data = np.genfromtxt('gaussian_fit.dat', skiprows = 1)
    x = data[:, 0]
    y = data[:, 1]
    n = len(x)
    mean = sum(x*y)/n
    sigma = sum(y*(x-mean)**2)/n
    
    
    def gaus(x,a,x0,sigma):
        return a*exp(-(x-x0)**2/(2*sigma**2))
    
    
    popt,pcov = curve_fit(gaus, x, y ,p0 = [1, mean, sigma])
    
    
    fig = plt.figure(figsize = (5, 5))
    plt.subplot(111)
    plt.plot(x, y, label='Raw')
    plt.plot(x, gaus(x, *popt), 'o', markersize = 4, label='Gaussian fit')
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.legend()
    fig.savefig('gaussian_fit.pdf')
    

    enter image description here

  • Tom Kurushingal
    Tom Kurushingal over 8 years
    Could you please explain on how did you arrive at the starting values.
  • Cynthia GS
    Cynthia GS almost 8 years
    Works beautifully! from sklearn import mixture isn't actually necessary for the code to run.