Convert an array into an index hash in Ruby

53,728

Solution 1

If all you need the hash for is membership, consider using a Set:

Set

Set implements a collection of unordered values with no duplicates. This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup.

Set is easy to use with Enumerable objects (implementing each). Most of the initializer methods and binary operators accept generic Enumerable objects besides sets and arrays. An Enumerable object can be converted to Set using the to_set method.

Set uses Hash as storage, so you must note the following points:

  • Equality of elements is determined according to Object#eql? and Object#hash.
  • Set assumes that the identity of each element does not change while it is stored. Modifying an element of a set will render the set to an unreliable state.
  • When a string is to be stored, a frozen copy of the string is stored instead unless the original string is already frozen.

Comparison

The comparison operators <, >, <= and >= are implemented as shorthand for the {proper_,}{subset?,superset?} methods. However, the <=> operator is intentionally left out because not every pair of sets is comparable. ({x,y} vs. {x,z} for example)

Example

require 'set'
s1 = Set.new [1, 2]                   # -> #<Set: {1, 2}>
s2 = [1, 2].to_set                    # -> #<Set: {1, 2}>
s1 == s2                              # -> true
s1.add("foo")                         # -> #<Set: {1, 2, "foo"}>
s1.merge([2, 6])                      # -> #<Set: {1, 2, "foo", 6}>
s1.subset? s2                         # -> false
s2.subset? s1                         # -> true

[...]

Public Class Methods

new(enum = nil)

Creates a new set containing the elements of the given enumerable object.

If a block is given, the elements of enum are preprocessed by the given block.

Solution 2

try this one:

a=[1,2,3]
Hash[a.zip]

Solution 3

You can do this very handy trick:

Hash[*[1, 2, 3, 4].map {|k| [k, nil]}.flatten]
=> {1=>nil, 2=>nil, 3=>nil, 4=>nil}

Solution 4

If you want to quickly ask "is X in the array?" you should use Array#include?.

Edit (in response to addition in OP):

If you want speedy look up times, use a Set. Having a Hash that points to all nils is silly. Conversion is an easy process too with Array#to_set.

require 'benchmark'
require 'set'

array = (1..1_000_000).to_a
set = array.to_set

Benchmark.bm(15) do |x|
    x.report("Array.include?") { 1000.times { array.include?(500_000) } }
    x.report("Set.include?") { 1000.times { set.include?(500_000) } }
end

Results on my machine:

                     user     system      total        real
Array.include?  36.200000   0.140000  36.340000 ( 36.740605)
Set.include?     0.000000   0.000000   0.000000 (  0.000515)

You should consider just using a set to begin with, instead of an array so that a conversion is never necessary.

Solution 5

I'm fairly certain that there isn't a one-shot clever way to construct this hash. My inclination would be to just be explicit and state what I'm doing:

hash = {}
array.each{|x| hash[x] = nil}

It doesn't look particularly elegant, but it's clear, and does the job.

FWIW, your original suggestion (under Ruby 1.8.6 at least) doesn't seem to work. I get an "ArgumentError: odd number of arguments for Hash" error. Hash.[] expects a literal, even-lengthed list of values:

Hash[a, 1, b, 2] # => {a => 1, b => 2}

so I tried changing your code to:

hash = Hash[*array.map {|x| [x, nil]}.flatten]

but the performance is dire:

#!/usr/bin/ruby -w
require 'benchmark'

array = (1..100_000).to_a

Benchmark.bm(15) do |x|
  x.report("assignment loop") {hash = {}; array.each{|e| hash[e] = nil}}
  x.report("hash constructor") {hash = Hash[*array.map {|e| [e, nil]}.flatten]}
end

gives

                     user     system      total        real
assignment loop  0.440000   0.200000   0.640000 (  0.657287)
hash constructor  4.440000   0.250000   4.690000 (  4.758663)

Unless I'm missing something here, a simple assignment loop seems the clearest and most efficient way to construct this hash.

Share:
53,728

Related videos on Youtube

derobert
Author by

derobert

Updated on August 19, 2020

Comments

  • derobert
    derobert almost 4 years

    I have an array, and I want to make a hash so I can quickly ask "is X in the array?".

    In perl, there is an easy (and fast) way to do this:

    my @array = qw( 1 2 3 );
    my %hash;
    @hash{@array} = undef;
    

    This generates a hash that looks like:

    {
        1 => undef,
        2 => undef,
        3 => undef,
    }
    

    The best I've come up with in Ruby is:

    array = [1, 2, 3]
    hash = Hash[array.map {|x| [x, nil]}]
    

    which gives:

    {1=>nil, 2=>nil, 3=>nil}
    

    Is there a better Ruby way?

    EDIT 1

    No, Array.include? is not a good idea. Its slow. It does a query in O(n) instead of O(1). My example array had three elements for brevity; assume the actual one has a million elements. Let's do a little benchmarking:

    #!/usr/bin/ruby -w
    require 'benchmark'
    
    array = (1..1_000_000).to_a
    hash = Hash[array.map {|x| [x, nil]}]
    
    Benchmark.bm(15) do |x|
        x.report("Array.include?") { 1000.times { array.include?(500_000) } }
        x.report("Hash.include?") { 1000.times { hash.include?(500_000) } }
    end
    

    Produces:

                         user     system      total        real
    Array.include?  46.190000   0.160000  46.350000 ( 46.593477)
    Hash.include?    0.000000   0.000000   0.000000 (  0.000523)
    
    • Sean Vikoren
      Sean Vikoren over 12 years
      Don't forget to consider the time it takes to do the conversion. Of course, if your situation allows, using a set to begin with (as suggested by @Zach Langley) obviates this cost.
    • drhenner
      drhenner almost 8 years
      To be fair the benchmark above should include converting from an array to a hash
    • derobert
      derobert almost 8 years
      @drhenner in theory, sure. In practice, not really—it's basically irrelevant. The conversion is done once, the lookup many, many, many times. I forget what I was working on when I asked this, but in the actual program the lookup was probably done many millions of times, after converting once.
  • derobert
    derobert over 15 years
    Please see my (newly added) section on why not Array#include? in the question.
  • derobert
    derobert over 15 years
    Neat approach, but obscenely slow; on 1..1_000_000: 1.4s (my map code) vs. got-bored-waiting after 3 minutes.
  • derobert
    derobert over 15 years
    Empirically, that seems to run in O(n^2)
  • derobert
    derobert over 15 years
    I realized I left off the closing ] in the code... Fixed. It works with ruby 1.8.7 here, and also 1.9.0
  • derobert
    derobert over 15 years
    The hash load is so quick, this would hardly ever make sense, unless you're really short on memory. The entire Hash load is O(n), a single a.include? call is also O(n).
  • zenazn
    zenazn over 15 years
    In this case, yes, but this can be generalized to memoize any function of your choice.
  • Zach Langley
    Zach Langley over 15 years
    For Ruby 1.8.6, you need the splat, i.e., hash = Hash[*array.map { |x| [x, nil] }.flatten]
  • covard
    covard over 11 years
    This worked great for me. I changed it to be [k,k] instead of [k, nil]. Thanks for this post
  • denis.peplin
    denis.peplin over 9 years
    Perfect! Most elegant way.
  • l3thal
    l3thal about 8 years
    Why not just do Hash[ [1, 2, 3, 4].map {|k| [k, nil]} ] ? There's no need for the extra splat and flatten.
  • drhenner
    drhenner almost 8 years
    Use merge! should be much faster
  • drhenner
    drhenner almost 8 years
    array.inject(Hash.new) { |h,i| h.merge!({i => nil}) } turns out to be pretty fast but still array.each{|e| hash[e] = nil} is 3.5 times faster
  • Alex Pan
    Alex Pan over 7 years
    Hash[a.zip] also returns the same response.
  • Magne
    Magne over 6 years
    "this certainly works... but i don't think it's the most efficient... you're iterating twice in this case. Not a concern of course with an array of length 2, but still worth noting." @brad from stackoverflow.com/a/9434078/380607