How to plot a PMF of a sample?

14,576

Solution 1

If ts is a series, you may obtain PMF of the sample by:

>>> pmf = ts.value_counts().sort_index() / len(ts)

and plot it by:

>>> pmf.plot(kind='bar')

numpy only solution can be done using np.unique:

>>> xs = np.random.randint(0, 10, 100)
>>> xs
array([5, 2, 2, 1, 2, 8, 6, 7, 5, 3, 2, 6, 4, 9, 7, 6, 4, 7, 6, 8, 7, 0, 6,
       2, 9, 8, 7, 7, 2, 6, 2, 8, 0, 2, 5, 1, 3, 6, 7, 7, 2, 2, 0, 3, 8, 7,
       4, 0, 5, 7, 5, 4, 4, 9, 5, 1, 6, 6, 0, 9, 4, 2, 0, 8, 7, 5, 1, 1, 2,
       8, 3, 8, 9, 0, 0, 6, 8, 7, 2, 6, 7, 9, 7, 8, 8, 3, 3, 7, 8, 2, 2, 4,
       4, 5, 3, 4, 1, 5, 5, 1])

>>> val, cnt = np.unique(xs, return_counts=True)
>>> pmf = cnt / len(xs)

>>> # values along with probability mass function
>>> np.column_stack((val, pmf))
array([[ 0.  ,  0.08],
       [ 1.  ,  0.07],
       [ 2.  ,  0.15],
       [ 3.  ,  0.07],
       [ 4.  ,  0.09],
       [ 5.  ,  0.1 ],
       [ 6.  ,  0.11],
       [ 7.  ,  0.15],
       [ 8.  ,  0.12],
       [ 9.  ,  0.06]])

Solution 2

Given a Pandas Dataframe, df, using seaborn you can write

import seaborn as sns

probabilities = df['SomeColumn'].value_counts(normalize=True)    
sns.barplot(probabilities.index, probabilities.values)
Share:
14,576
Milena Araujo
Author by

Milena Araujo

Updated on June 13, 2022

Comments

  • Milena Araujo
    Milena Araujo almost 2 years

    Is there any function or library that would help me to plot a probability mass function of a sample the same way there is for plotting the probability density function of a sample ?

    For instance, using pandas, plotting a PDF is as simple as calling:

    sample.plot(kind="density")
    

    If there is no easy way, how can I compute the PMF so I could plot using matplotlib ?