matplotlib normed histograms
Solution 1
In order to plot a subset of the histogram, I don't think you can get around to calculating the whole histogram.
Have you tried computing the histogram with numpy.histogram
and then plotting a region using pylab.plot
or something? I.e.
import numpy as np
import pylab as plt
data = np.random.normal(size=10000)*10000
plt.figure(0)
plt.hist(data, bins=np.arange(data.min(), data.max(), 1000))
plt.figure(1)
hist1 = np.histogram(data, bins=np.arange(data.min(), data.max(), 1000))
plt.bar(hist1[1][:-1], hist1[0], width=1000)
plt.figure(2)
hist2 = np.histogram(data, bins=np.arange(data.min(), data.max(), 200))
mask = (hist2[1][:-1] < 20000) * (hist2[1][:-1] > 0)
plt.bar(hist2[1][mask], hist2[0][mask], width=200)
Original histogram:
Histogram calculated manually:
Histogram calculated manually, cropped: (N.B.: values are smaller because bins are narrower)
Solution 2
I think, you can normalize your data using a given weight. (repeat
is a numpy function).
hist(data, bins=arange(0, 121, 1), weights=repeat(1.0/len(data), len(data)))
cdecker
I'm a PhD student in Computer Science at ETH Zurich doing research on Bitcoin. Publications Bitcoin meets Strong Consistency Information Propagation in the Bitcoin Network (bibtex) Bitcoin Transaction Malleability and MtGox (bibtex) Have a Snack, Pay with Bitcoins (bibtex) BlueWallet: The secure Bitcoin Wallet (bibtex)
Updated on June 08, 2022Comments
-
cdecker almost 2 years
I'm trying to draw part of a histogram using matplotlib.
Instead of drawing the whole histogram which has a lot of outliers and large values I want to focus on just a small part. The original histogram looks like this:
hist(data, bins=arange(data.min(), data.max(), 1000), normed=1, cumulative=False) plt.ylabel("PDF")
And after focusing it looks like this:
hist(data, bins=arange(0, 121, 1), normed=1, cumulative=False) plt.ylabel("PDF")
Notice that the last bin is stretched and worst of all the Y ticks are scaled so that the sum is exactly 1 (so points out of the current range are not taken into account at all)
I know that I can achieve what I want by drawing the histogram over the whole possible range and then restricting the axis to the part I'm interested in, but it wastes a lot of time calculating bins that I won't use/see anyway.
hist(btsd-40, bins=arange(btsd.min(), btsd.max(), 1), normed=1, cumulative=False) axis([0,120,0,0.0025])
Is there a fast and easy way to draw just the focused region but still get the Y scale correct?