Fit a gaussian function

84,933

Solution 1

Take a look at this answer for fitting arbitrary curves to data. Basically you can use scipy.optimize.curve_fit to fit any function you want to your data. The code below shows how you can fit a Gaussian to some random data (credit to this SciPy-User mailing list post).

import numpy
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Define some test data which is close to Gaussian
data = numpy.random.normal(size=10000)

hist, bin_edges = numpy.histogram(data, density=True)
bin_centres = (bin_edges[:-1] + bin_edges[1:])/2

# Define model function to be used to fit to the data above:
def gauss(x, *p):
    A, mu, sigma = p
    return A*numpy.exp(-(x-mu)**2/(2.*sigma**2))

# p0 is the initial guess for the fitting coefficients (A, mu and sigma above)
p0 = [1., 0., 1.]

coeff, var_matrix = curve_fit(gauss, bin_centres, hist, p0=p0)

# Get the fitted curve
hist_fit = gauss(bin_centres, *coeff)

plt.plot(bin_centres, hist, label='Test data')
plt.plot(bin_centres, hist_fit, label='Fitted data')

# Finally, lets get the fitting parameters, i.e. the mean and standard deviation:
print 'Fitted mean = ', coeff[1]
print 'Fitted standard deviation = ', coeff[2]

plt.show()

Solution 2

You can try sklearn gaussian mixture model estimation as below :

import numpy as np
import sklearn.mixture

gmm = sklearn.mixture.GMM()

# sample data
a = np.random.randn(1000)

# result
r = gmm.fit(a[:, np.newaxis]) # GMM requires 2D data as of sklearn version 0.16
print("mean : %f, var : %f" % (r.means_[0, 0], r.covars_[0, 0]))

Reference : http://scikit-learn.org/stable/modules/mixture.html#mixture

Note that in this way, you don't need to estimate your sample distribution with an histogram.

Solution 3

Kind of an old question, but for anybody looking just to plot a density fit for a series, you could try matplotlib's .plot(kind='kde'). Docs here.

Example with pandas:

mydf.x.plot(kind='kde')

Solution 4

I am not sure what your input is, but possibly your y-axis scale is too large (20000), try reducing this number. The following code works for me:

import matplotlib.pyplot as plt
import numpy as np

#created my variable
v = np.random.normal(0,1,1000)


fig, ax = plt.subplots()


plt.hist(v, bins=500, normed=1, color='#7F38EC', histtype='step')

#plot
plt.title("Gaussian")
plt.axis([-1, 2, 0, 1]) #changed 20000 to 1

plt.show()

Edit:

If you want the actual count of values on y-axis, you can set normed=0. And would just get rid of the plt.axis([-1, 2, 0, 1]).

import matplotlib.pyplot as plt
import numpy as np

#function
v = np.random.normal(0,1,500000)


fig, ax = plt.subplots()

# changed normed=1 to normed=0
plt.hist(v, bins=500, normed=0, color='#7F38EC', histtype='step')

#plot
plt.title("Gaussian")
#plt.axis([-1, 2, 0, 20000]) 

plt.show()
Share:
84,933

Related videos on Youtube

Chris
Author by

Chris

A full-stack web application developer, working at Enable.

Updated on February 05, 2020

Comments

  • Chris
    Chris over 4 years

    I have a histogram (see below) and I am trying to find the mean and standard deviation along with code which fits a curve to my histogram. I think there is something in SciPy or matplotlib that can help, but every example I've tried doesn't work.

    import matplotlib.pyplot as plt
    import numpy as np
    
    with open('gau_b_g_s.csv') as f:
        v = np.loadtxt(f, delimiter= ',', dtype="float", skiprows=1, usecols=None)
    
    fig, ax = plt.subplots()
    
    plt.hist(v, bins=500, color='#7F38EC', histtype='step')
    
    plt.title("Gaussian")
    plt.axis([-1, 2, 0, 20000])
    
    plt.show()
    
    • Jodaka
      Jodaka almost 12 years
      What do you mean by doesn't work? It doesn't run, or the output isn't correct?
    • Admin
      Admin almost 12 years
      I can't get the codes from the internet to run, to actually make a curve like they are supposed to
    • Admin
      Admin almost 12 years
      which is most likely happening because I just started programming and I generally have no idea what i'm doing
    • Jodaka
      Jodaka almost 12 years
      So are you getting an error message when you try to run it? Or does the program complete without producing anything?
    • Admin
      Admin almost 12 years
      I just don't know how to properly make it work with my data
  • Admin
    Admin almost 12 years
    no im working with over half a million points so I want the scale to be that big because I don't want like 50,000 bins
  • Akavall
    Akavall almost 12 years
    @I believe the value on y-axis does not tell you the number of observations in each bin, it tells you percentage in each bin. Just comment out the whole plt.axis([-1, 2, 0, 1]) line and run it, you should get a distribution plot.
  • Admin
    Admin almost 12 years
    thanks, this got the mean and sd well, but the curve fit doesn't actually produce a curve, it produces lines
  • Chris
    Chris almost 12 years
    Do you mean my example just produces lines? Or when you apply the above code to your data you get lines? Also, what is the difference between a line and a curve?
  • Admin
    Admin almost 12 years
    as opposed to bell curve type shape, it just looks like a carrot ^
  • Admin
    Admin almost 12 years
    its definitely telling me the number in each bin because i can see the histogram itself with the y axis at 20,000
  • Chris
    Chris almost 12 years
    Without more information I can't really help you. Do you mean it looks like a carrot with your data? If so, then presumably it is because that is what your data looks like. When asking questions it is best to include a short, self contained example.
  • SuperElectric
    SuperElectric about 11 years
    I suspect @user1496646 means that, in his case, there aren't that many <bin_centres>, so when you plot(bin_centres, hist_fit), it comes out poorly sampled Gaussian ("carrot"). He should just subsample the bin_centers, using new_bin_centers = numpy.linspace(bin_centres[0], bin_centres[-1], 200), new_hist_fit = gauss(new_bin_centres, *coeff), and plot(new_bin_centres, new_hist_fit)
  • Akavall
    Akavall almost 11 years
    Downvoter, can you please explain the reason for the downvote?
  • Joseph Farah
    Joseph Farah almost 6 years
    wow, TIL matplotlib has kernel density estimation built in. +1
  • jjc385
    jjc385 over 3 years
    sklearn.mixture.GMM seems to have been replaced with sklearn.mixture.GaussianMixture at some point.