Scaled logarithmic binning in python

10,174

Matplotlib won't help you much if you have special requirements of your histograms. You can, however, easily create and manipulate a histogram with numpy.

import numpy as np
from matplotlib import pyplot as plt

# something random to plot
data = (np.random.random(10000)*10)**3

# log-scaled bins
bins = np.logspace(0, 3, 50)
widths = (bins[1:] - bins[:-1])

# Calculate histogram
hist = np.histogram(data, bins=bins)
# normalize by bin width
hist_norm = hist[0]/widths

# plot it!
plt.bar(bins[:-1], hist_norm, widths)
plt.xscale('log')
plt.yscale('log')

Obviously when you do present your data in a non-obvious way like this, you have to be very careful about how to label your y axis properly and write an informative figure caption.

Share:
10,174
SarthakC
Author by

SarthakC

Updated on June 05, 2022

Comments

  • SarthakC
    SarthakC about 2 years

    I'm interested in plotting the probability distribution of a set of points which are distributed as a power law. Further, I would like to use logarithmic binning to be able to smooth out the large fluctuations in the tail. If I just use logarithmic binning, and plot it on a log log scale, such as

    pl.hist(MyList,log=True, bins=pl.logspace(0,3,50))
    pl.xscale('log')
    

    for example, then the problem is that the larger bins account for more points, i.e. the heights of my bins are not scaled by bin size.

    Is there a way to use logarithmic binning, and yet make python scale all the heights by the size of the bin? I know I can probably do this in some roundabout fashion manually, but it seems like this should be a feature that exists, but I can't seem to find it. If you think histograms are fundamentally a bad way to represent my data and you have a better idea, then I'd love to hear that too.

    Thanks!

  • SarthakC
    SarthakC about 8 years
    Thanks! :) That works for my purpose, though I would prefer a more direct way if it exists. For power-law like data this just seems to be the most natural way for me to represent the data. If there's no better answer involving matplotlib functionality directly in the next day or so I'll accept your answer.