matplotlib normed histograms

11,065

Solution 1

In order to plot a subset of the histogram, I don't think you can get around to calculating the whole histogram.

Have you tried computing the histogram with numpy.histogram and then plotting a region using pylab.plot or something? I.e.

import numpy as np
import pylab as plt

data = np.random.normal(size=10000)*10000

plt.figure(0)
plt.hist(data, bins=np.arange(data.min(), data.max(), 1000))

plt.figure(1)
hist1 = np.histogram(data, bins=np.arange(data.min(), data.max(), 1000))
plt.bar(hist1[1][:-1], hist1[0], width=1000)

plt.figure(2)
hist2 = np.histogram(data, bins=np.arange(data.min(), data.max(), 200))
mask = (hist2[1][:-1] < 20000) * (hist2[1][:-1] > 0)
plt.bar(hist2[1][mask], hist2[0][mask], width=200)

Original histogram: Original histogram

Histogram calculated manually: Histogram calculated manually

Histogram calculated manually, cropped: Histogram calculated manually, cropped (N.B.: values are smaller because bins are narrower)

Solution 2

I think, you can normalize your data using a given weight. (repeat is a numpy function).

hist(data, bins=arange(0, 121, 1), weights=repeat(1.0/len(data), len(data)))

Share:
11,065
cdecker
Author by

cdecker

I'm a PhD student in Computer Science at ETH Zurich doing research on Bitcoin. Publications Bitcoin meets Strong Consistency Information Propagation in the Bitcoin Network (bibtex) Bitcoin Transaction Malleability and MtGox (bibtex) Have a Snack, Pay with Bitcoins (bibtex) BlueWallet: The secure Bitcoin Wallet (bibtex)

Updated on June 08, 2022

Comments

  • cdecker
    cdecker almost 2 years

    I'm trying to draw part of a histogram using matplotlib.

    Instead of drawing the whole histogram which has a lot of outliers and large values I want to focus on just a small part. The original histogram looks like this:

    hist(data, bins=arange(data.min(), data.max(), 1000), normed=1, cumulative=False)
    plt.ylabel("PDF")
    

    enter image description here

    And after focusing it looks like this:

    hist(data, bins=arange(0, 121, 1), normed=1, cumulative=False)
    plt.ylabel("PDF")
    

    enter image description here

    Notice that the last bin is stretched and worst of all the Y ticks are scaled so that the sum is exactly 1 (so points out of the current range are not taken into account at all)

    I know that I can achieve what I want by drawing the histogram over the whole possible range and then restricting the axis to the part I'm interested in, but it wastes a lot of time calculating bins that I won't use/see anyway.

    hist(btsd-40, bins=arange(btsd.min(), btsd.max(), 1), normed=1, cumulative=False)
    axis([0,120,0,0.0025])
    

    enter image description here

    Is there a fast and easy way to draw just the focused region but still get the Y scale correct?