How to create a histogram from a flat Array in Ruby
Solution 1
Use "histogram".
data = [0,1,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,5,6,6,6,7,7,7,7,7,8,9,9,10]
(bins, freqs) = data.histogram
This will create an array bins
containing the bins of histogram and the array freqs
containing the frequencies.
The gem also supports different binning behaviors and weights/fractions.
Hope this helps.
Solution 2
Ruby's Array inherits group_by
from Enumerable, which does this nicely:
Hash[*data.group_by{ |v| v }.flat_map{ |k, v| [k, v.size] }]
Which returns:
{
0 => 1,
1 => 1,
2 => 5,
3 => 6,
4 => 4,
5 => 2,
6 => 3,
7 => 5,
8 => 1,
9 => 2,
10 => 1
}
That's just a nice 'n clean hash. If you want an array of each bin and frequency pair you can shorten it and use:
data = [0,1,2,2,3,3,3,4]
data.group_by{ |v| v }.map{ |k, v| [k, v.size] }
# => [[0, 1], [1, 1], [2, 2], [3, 3], [4, 1]]
Here's what the code and group_by
is doing with the smaller dataset:
data.group_by{ |v| v }
# => {0=>[0], 1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]}
data.group_by{ |v| v }.flat_map{ |k, v| [k, v.size] }
# => [0, 1, 1, 1, 2, 2, 3, 3, 4, 1]
As mentioned by Telmo Costa in the comments, Ruby introduced tally
in v2.7.0. Running a quick benchmark shows that tally
is about 3x faster:
require 'fruity'
puts "Ruby v#{RUBY_VERSION}"
data = [0,1,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,5,6,6,6,7,7,7,7,7,8,9,9,10]
data.group_by{ |v| v }.map{ |k, v| [k, v.size] }.to_h
# => {0=>1, 1=>1, 2=>5, 3=>6, 4=>4, 5=>2, 6=>3, 7=>5, 8=>1, 9=>2, 10=>1}
data.group_by { |v| v }.transform_values(&:size)
# => {0=>1, 1=>1, 2=>5, 3=>6, 4=>4, 5=>2, 6=>3, 7=>5, 8=>1, 9=>2, 10=>1}
data.tally
# => {0=>1, 1=>1, 2=>5, 3=>6, 4=>4, 5=>2, 6=>3, 7=>5, 8=>1, 9=>2, 10=>1}
data.group_by{ |v| v }.keys.sort.map { |key| [key, data.group_by{ |v| v }[key].size] }.to_h
# => {0=>1, 1=>1, 2=>5, 3=>6, 4=>4, 5=>2, 6=>3, 7=>5, 8=>1, 9=>2, 10=>1}
compare do
gb { data.group_by{ |v| v }.map{ |k, v| [k, v.size] }.to_h }
rriemann { data.group_by { |v| v }.transform_values(&:size) }
telmo_costa { data.tally }
CBK {data.group_by{ |v| v }.keys.sort.map { |key| [key, data.group_by{ |v| v }[key].size] }.to_h }
end
Resulting in:
# >> Ruby v2.7.0
# >> Running each test 1024 times. Test will take about 2 seconds.
# >> telmo_costa is faster than rriemann by 2x ± 0.1
# >> rriemann is similar to gb
# >> gb is faster than CBK by 8x ± 1.0
So use tally
.
Related videos on Youtube
Whitecat
Projects I currently am working on a way to visualize UML model diffs I previously worked on a way to extract code snippets within: websites, forums, emails, and so on. Other Interests I like to play tennis on my free time. I am also trying to pick up the game of Badminton. Recently I have picked up traveling, I would like to go to China and India for my next vacations
Updated on June 25, 2022Comments
-
Whitecat about 2 years
How do I create a histogram of an array of integers? For example:
data = [0,1,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,5,6,6,6,7,7,7,7,7,8,9,9,10]
I want to create a histogram based on how many entries there are for
0
,1
,2
, and so on. Is there an easy way to do it in Ruby?The output should be two arrays. The first array should contain the groups (bins), the second array should contain the number of occurrences (frequencies).
For
data
given above, I would expect the following output:bins # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] frequencies # => [1, 1, 5, 6, 4, 2, 3, 5, 1, 2, 1]
-
sawa almost 11 yearsWhat is the output format you want?
-
PJP almost 11 yearsWhen you ask a question, asking for code, you need to show your research and any attempts you made to solve the problem, along with your explanation why they didn't work.
-
knugie almost 6 yearsI would initialise a "counting hash" like that
h = Hash.new(0)
and count occurrences of each element:data.each{|v| h[v] += 1}
. After that,h
will look like this:=> {0=>1, 1=>1, 2=>5, 3=>6, 4=>4, 5=>2, 6=>3, 7=>5, 8=>1, 9=>2, 10=>1}
You can extract bins and freqs from that Hash usingh.keys
andh.values
. I hope you find that useful. -
dug about 5 yearsRuby 2.7.0 introduces
Enumerable#tally
:data.tally
=>{0=>1, 1=>1, 2=>5, 3=>6, 4=>4, 5=>2, 6=>3, 7=>5, 8=>1, 9=>2, 10=>1}
-
-
rriemann over 5 yearsThe latest ruby versions allow to use some syntax sugar for a shorter version:
data = group_by { |v| v }.transform_values(&:size)
-
sambecker about 5 yearsThis should be the accepted answer!
-
Telmo Costa over 4 yearsYou can use
itself
:data.group_by(&:itself).transform_values(&:size)
. Or, has it has been said before, starting on Ruby 2.7.0data.tally
. -
CBK over 4 yearsTo sort the keys in ascending order:
data.group_by{ |v| v }.keys.sort.map do |key| [key, data.group_by{ |v| v }[key].size] end
-
PJP over 4 yearsNo, don't. It's at least 8x slower if you use the OPs sample array. See the benchmarks.