Why does scipy.norm.pdf sometimes give PDF > 1? How to correct it?
It's not a bug. It's not an incorrect result either. Probability density function's value at some specific point does not give you probability; it is a measure of how dense the distribution is around that value. For continuous random variables, the probability at a given point is equal to zero. Instead of p(X = x)
, we calculate probabilities between 2 points p(x1 < X < x2)
and it is equal to the area below that probability density function. Probability density function's value can very well be above 1. It can even approach to infinity.
Ébe Isaac
PhD in Machine Learning. My research interests are anomaly detection, biometrics, gait analysis, and data science. Profiles: ResearchGate LinkedIn
Updated on April 26, 2020Comments
-
Ébe Isaac about 4 years
Given mean and variance of a Gaussian (normal) random variable, I would like to compute its probability density function (PDF).
I referred this post: Calculate probability in normal distribution given mean, std in Python,
Also the scipy docs: scipy.stats.norm
But when I plot a PDF of a curve, the probability exceeds 1! Refer to this minimum working example:
import numpy as np import scipy.stats as stats x = np.linspace(0.3, 1.75, 1000) plt.plot(x, stats.norm.pdf(x, 1.075, 0.2)) plt.show()
This is what I get:
How is it even possible to have 200% probability to get the mean, 1.075? Am I misinterpreting anything here? Is there any way to correct this?
-
Severin Pappadeux almost 8 years@ÉbeIsaac to add a point to the answer INTEGRAL of PDF over the interval is equal to 1. But PDF itself might be above 1, below 1, 0. Cannot be negative, of course.
-
AruniRC almost 5 yearsAs a general point, I think most introductory (college level) probability and statistics textbooks do not discuss these issues, and without some exposure to real analysis/measure/Riemann-sums it is not easy to develop an intuition. I found this to be a painless intro: statsathome.com/2017/06/26/…