Histogram using Excel FREQUENCY function

13,459

(This is fairly different in approach to the macro-driven dynamic range-resizing thing, so I'm using a separate answer...)

A dynamic histogram chart can be built by remembering that "named ranges" are actually named formulas, so their values may be dynamic, extremely so in some cases.

Let's start with the assumption that we have an arbitrary set of values in column A, starting at row 1 and also that we have another cell that contains the number of bins we want in our histogram. In my workbook that happens to be E2. So we fire up the Name Manager (on the "Formulas" tab) and create

num_bins             =Sheet1!$E$2

I've gone for defining a number of bins, rather than a bin size (which we'll define later) because the latter makes it tricky to know exactly how to set our bin boundaries: are we happy with the idea that the first and last bins may cover different-sized parts of the range of values, for example?*

We can also set up dynamic formulas to describe our data:

data_count           =COUNT(Sheet1!$A:$A)
data_vals            =OFFSET(Sheet1!$A$1,0,0,data_count,1)
max_val              =MAX(data_vals)
min_val              =MIN(data_vals)

With those defined, we can get fancy. How big should each bin be? Make another named formula:

bin_size             =(max_val-min_val)/(num_bins)

And here comes the science: these formulas make the dynamic arrays:

bin_array            =min_val+ROW(OFFSET(Sheet1!$A$1,0,0,num_bins-1,1))*bin_size
bin_labels           =min_val+ROW(OFFSET(Sheet1!$A$1,0,0,num_bins,1))*bin_size        
data_vals            =FREQUENCY(data_vals,bin_array)

The first one is the trickier: it uses the row numbers of a num_bins minus one-size range to generate multiple of bin_size. It doesn't start the array at min_val because the FREQUENCY() function counts items up to each bin value. It's one smaller than the number of bins desired because the function produces an array one larger, where the final entry has the points above the highest bin number. So we make a separate bin_labels array for presentation purposes.

Now we can make a chart. Insert a (say) a 2-D column chart and open the "Select Data" dialog (either from the ribbon or right-clicking the chart). Add a new series, setting Series values to =Sheet1!freq_array. It's necessary to include either the sheet name or the workbook name to get this to work. Add a series name if you like and click "OK". Now click "Edit" for "Horizontal (Category) Axis Labels" and set the range to =Sheet1!bin_labels.

Here's 2000 cells with =RAND()*5 and 5 bins (I listed the names and their formulas, with values where they don't produce arrays)

2000 <code>=RAND()*5</code> results into 5 bins

And the same sheet after changing num_bins to 10. (The RAND() formulas recalculated, so the bins may not add up to exactly the same values)

After changing num_bins to 10

  • (if you must have a user-defined bin size, you'll need to make bin_size the sheet reference and calculate num_bins with a named formula)
Share:
13,459

Related videos on Youtube

l33t
Author by

l33t

Updated on June 04, 2022

Comments

  • l33t
    l33t about 2 years

    In Excel 2010, I have a list of values in column A and a bin size is specified in B1. This allows me to create histograms with N bins using this formula:

    {=FREQUENCY(A:A,(ROW(INDIRECT("1:"&CEILING((MAX(A:A)-MIN(A:A))/B1,1)))-1)*B1+MIN(A:A))}

    The only problem is that I need to select N cells and apply this formula to get N bins to be used as data source for my bar chart. Is it possible to skip this step? E.g. Is it possible to use this formula in a single cell - somewhat modified - so that when used as data source, it is interpreted as N cells, producing a nice histogram with N values?

    Thanks.

    Here's the answer that led me to the formula above.

  • l33t
    l33t over 12 years
    I want to create a dynamic histogram that can easily be changed by altering a single constant (the bin size). I might even disable UI interaction (I'm creating the histogram using Excel automation).
  • Mike Woodhouse
    Mike Woodhouse over 12 years
    By "histogram", is the desired output a chart or an array on the worksheet? If a chart, the whole thing can be done with named formulae. Otherwise you're going to need a range resizing function as described or set a maximum size for the range and make your arrays that large.
  • l33t
    l33t over 12 years
    A chart would be fine, as long as the chart is updated when the bin size is changed. I've tried playing with named formulae, but keep getting errors... Never done this stuff before :P
  • l33t
    l33t over 12 years
    Now that's a brilliant answer. Thanks! :)
  • l33t
    l33t over 12 years
    Small correction though: the second data_vals should be renamed to avoid a name clash.