pushing elements onto an array in a ruby Hash

25,277

Solution 1

Sometimes the hash is initially filled with data and later on it is only used to retrieve data. In those cases I prefer the first possibility, because the default proc can be "emptied" (in Ruby 1.9).

ht = Hash.new {|h,k| h[k]=[]}
ht["cats"] << "Jellicle"
ht["cats"] << "Mr. Mistoffelees"
ht["dogs"]
p ht
#=> {"cats"=>["Jellicle", "Mr. Mistoffelees"], "dogs"=>[]}

ht.default_proc = proc{}
ht["parrots"] #nil
p ht
#=> {"cats"=>["Jellicle", "Mr. Mistoffelees"], "dogs"=>[]} No parrots!

Solution 2

In the OP, I said I had my own opinion. Here it is.

Although the "fancy initializer" approach is elegant, it can lead to some really unexpected behavior -- specifically generating keys when you don't expect it -- and there's no way to know this by looking at the hash table.

Consider the following:

>> ht1 = Hash.new {|h,k| h[k]=[]}
>> ht2 = {}
>> ht1["cats"] << "Jellicle"
=> ["Jellicle"]
>> (ht2["cats"] ||= []) << "Jellicle"
=> ["Jellicle"]

so far so good -- ht1 and ht2 are identical. but:

>> ht1["dogs"] ? "got dogs" : "no dogs"
=> "got dogs"
>> ht2["dogs"] ? "got dogs" : "no dogs"
=> "no dogs"

Note that simply accessing ht1[some_key] changes the state of the hash table, i.e. it create a new entry. You might argue that the end user should always use has_key?() to test for the presence of a hash entry -- and you'd be right -- but the above usage is an accepted idiom. For the hash table to automagically create an entry would be an unexpected side effect, so you should be careful if the hash table is ever exposed to the end user. (Note, however, that steenslag's answer shows how you can turn this off.)

Solution 3

If you know in advance the number and the name for each key, then you can use the first option. Or even a simpler one

ht = { "cats" => [] }

Otherwise, if you don't want (need) to preinitialize the hash, the second option is a good choice.

Share:
25,277
fearless_fool
Author by

fearless_fool

Embedded Processor Wizard, well seasoned and steeped in the MIT Media Lab culture of building cool things. For the last several decades, I've thrived on cramming lots of functionality into tiny processors. One of my specialities is exploiting the properties of single chip devices (e.g. GPIO ports, PWM timers, etc) to create robust designs with minimal parts count. My first startup, Ember Corporation (bought by Silicon Labs) ushered in the Internet of Things by releasing the first microcontrollers with embedded wireless mesh networking. Long before Ember, I made 6502, Z80 and PIC processors jump through hoops to control laser printers, environmental sensors, audio devices, lighting systems and electronic whoopee cushions. More recently, I've been working with RPi, various Arduino (including Intel Arduino 101), Freescale/NXP KL2xx, and I look forward creating new things on the ESP32, GR8, AM335x and/or nRF52 family of processors. My work doesn't stop at the microcontroller level: I use C, C++, Python, Javascript/Node, Ruby and other languages to connect the microcontrollers into cloud-based applications.

Updated on February 17, 2020

Comments

  • fearless_fool
    fearless_fool about 4 years

    In Ruby, to create a hash of arrays and push elements onto those arrays, I've seen two idioms. I'd like to know which one people prefer, and why. (Disclosure: I have my own opinion, but I want to make sure I'm not missing something obvious.)

    Approach 1: use Hash's fancy initializer:

    ht = Hash.new {|h,k| h[k]=[]}
    ht["cats"] << "Jellicle"
    ht["cats"] << "Mr. Mistoffelees"
    

    This approach creates an empty array when you access ht with a key that doesn't yet exist.

    Approach 2: simple initializer, fancy accessor:

    ht = {}
    (ht["cats"] ||= []) << "Jellicle"
    (ht["cats"] ||= []) << "Mr. Mistoffelees"
    

    Do people have an opinion on which one is better (or situations where one is preferred over the other)?