How to plot normalized histogram with pdf properly using matplotlib?

13,425

Solution 1

What makes you think it is not normalised? At a guess, it's probably because the heights of each column extend to values greater than 1. However, this thinking is flawed because in a normalised histogram/pdf, the total area under it should sum to one (not the heights). When you are dealing with small steps in x (as you are), that are less than one, then it is not surprising that the column heights are greater than one!

You can see this clearly in the scipy example you link: the x-values are much greater (by an order of magnitude) so it follows that their y-values are also smaller. You will see the same effect if you change your distribution to cover a wider range of values. Try a sigma of 10 instead of 0.1, see what happens!

Solution 2

import numpy as np
from numpy.random import seed, randn
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

"Try this, for 𝜇 = 0"
seed(0)
points = np.linspace(-5,5,100)
pdf    = norm.pdf(points,0,1)
plt.plot(points, pdf, color='r')
plt.hist(randn(50), density=True);
plt.show() 


"or this, for 𝜇 = 10"
seed(0)
points = np.linspace(5,15,100)
pdf    = norm.pdf(points,10,1)
plt.plot(points, pdf, color='r')
plt.hist(10+randn(50), density=True);
plt.show() 
Share:
13,425
Einar A
Author by

Einar A

Updated on June 12, 2022

Comments

  • Einar A
    Einar A almost 2 years

    I try to plot normalized histogram using example from numpy.random.normal documentation. For this purpose I generate normally distributed random sample.

    mu_true = 0
    sigma_true = 0.1 
    s = np.random.normal(mu_true, sigma_true, 2000)
    

    Then I fitt normal distribution to the data and calculate pdf.

    mu, sigma = stats.norm.fit(s)
    points = np.linspace(stats.norm.ppf(0.01,loc=mu,scale=sigma),
                     stats.norm.ppf(0.9999,loc=mu,scale=sigma),100)
    pdf = stats.norm.pdf(points,loc=mu,scale=sigma)
    

    Display fitted pdf and data histogram.

    plt.hist(s, 30, density=True);
    plt.plot(points, pdf, color='r')
    plt.show() 
    

    I use density=True, but it is obviously, that pdf and histogram are not normalized.

    enter image description here

    What can one suggests to plot truly normalized histogram and pdf?

    Seaborn distplot also doesn't solve the problem.

    import seaborn as sns
    ax = sns.distplot(s)
    

    enter image description here