What's the safest way to iterate through the keys of a Perl hash?

126,571

Solution 1

The rule of thumb is to use the function most suited to your needs.

If you just want the keys and do not plan to ever read any of the values, use keys():

foreach my $key (keys %hash) { ... }

If you just want the values, use values():

foreach my $val (values %hash) { ... }

If you need the keys and the values, use each():

keys %hash; # reset the internal iterator so a prior each() doesn't affect the loop
while(my($k, $v) = each %hash) { ... }

If you plan to change the keys of the hash in any way except for deleting the current key during the iteration, then you must not use each(). For example, this code to create a new set of uppercase keys with doubled values works fine using keys():

%h = (a => 1, b => 2);

foreach my $k (keys %h)
{
  $h{uc $k} = $h{$k} * 2;
}

producing the expected resulting hash:

(a => 1, A => 2, b => 2, B => 4)

But using each() to do the same thing:

%h = (a => 1, b => 2);

keys %h;
while(my($k, $v) = each %h)
{
  $h{uc $k} = $h{$k} * 2; # BAD IDEA!
}

produces incorrect results in hard-to-predict ways. For example:

(a => 1, A => 2, b => 2, B => 8)

This, however, is safe:

keys %h;
while(my($k, $v) = each %h)
{
  if(...)
  {
    delete $h{$k}; # This is safe
  }
}

All of this is described in the perl documentation:

% perldoc -f keys
% perldoc -f each

Solution 2

One thing you should be aware of when using each is that it has the side effect of adding "state" to your hash (the hash has to remember what the "next" key is). When using code like the snippets posted above, which iterate over the whole hash in one go, this is usually not a problem. However, you will run into hard to track down problems (I speak from experience ;), when using each together with statements like last or return to exit from the while ... each loop before you have processed all keys.

In this case, the hash will remember which keys it has already returned, and when you use each on it the next time (maybe in a totaly unrelated piece of code), it will continue at this position.

Example:

my %hash = ( foo => 1, bar => 2, baz => 3, quux => 4 );

# find key 'baz'
while ( my ($k, $v) = each %hash ) {
    print "found key $k\n";
    last if $k eq 'baz'; # found it!
}

# later ...

print "the hash contains:\n";

# iterate over all keys:
while ( my ($k, $v) = each %hash ) {
    print "$k => $v\n";
}

This prints:

found key bar
found key baz
the hash contains:
quux => 4
foo => 1

What happened to keys "bar" and baz"? They're still there, but the second each starts where the first one left off, and stops when it reaches the end of the hash, so we never see them in the second loop.

Solution 3

The place where each can cause you problems is that it's a true, non-scoped iterator. By way of example:

while ( my ($key,$val) = each %a_hash ) {
    print "$key => $val\n";
    last if $val; #exits loop when $val is true
}

# but "each" hasn't reset!!
while ( my ($key,$val) = each %a_hash ) {
    # continues where the last loop left off
    print "$key => $val\n";
}

If you need to be sure that each gets all the keys and values, you need to make sure you use keys or values first (as that resets the iterator). See the documentation for each.

Solution 4

Using the each syntax will prevent the entire set of keys from being generated at once. This can be important if you're using a tie-ed hash to a database with millions of rows. You don't want to generate the entire list of keys all at once and exhaust your physical memory. In this case each serves as an iterator whereas keys actually generates the entire array before the loop starts.

So, the only place "each" is of real use is when the hash is very large (compared to the memory available). That is only likely to happen when the hash itself doesn't live in memory itself unless you're programming a handheld data collection device or something with small memory.

If memory is not an issue, usually the map or keys paradigm is the more prevelant and easier to read paradigm.

Solution 5

A few miscellaneous thoughts on this topic:

  1. There is nothing unsafe about any of the hash iterators themselves. What is unsafe is modifying the keys of a hash while you're iterating over it. (It's perfectly safe to modify the values.) The only potential side-effect I can think of is that values returns aliases which means that modifying them will modify the contents of the hash. This is by design but may not be what you want in some circumstances.
  2. John's accepted answer is good with one exception: the documentation is clear that it is not safe to add keys while iterating over a hash. It may work for some data sets but will fail for others depending on the hash order.
  3. As already noted, it is safe to delete the last key returned by each. This is not true for keys as each is an iterator while keys returns a list.
Share:
126,571

Related videos on Youtube

Rudd Zwolinski
Author by

Rudd Zwolinski

Updated on July 08, 2022

Comments

  • Rudd Zwolinski
    Rudd Zwolinski almost 2 years

    If I have a Perl hash with a bunch of (key, value) pairs, what is the preferred method of iterating through all the keys? I have heard that using each may in some way have unintended side effects. So, is that true, and is one of the two following methods best, or is there a better way?

    # Method 1
    while (my ($key, $value) = each(%hash)) {
        # Something
    }
    
    # Method 2
    foreach my $key (keys(%hash)) {
        # Something
    }
    
  • ysth
    ysth over 15 years
    Re "not true for keys", rather: it's not applicable to keys and any delete is safe. The phrasing you use implies it's never safe to delete anything when using keys.
  • ysth
    ysth over 15 years
    Re: "nothing unsafe about any of the hash iterators", the other danger is assuming the iterator is at the beginning before starting an each loop, as others mention.
  • ysth
    ysth over 15 years
    Please add a void-context keys %h; before each each loop to show safely using the iterator.
  • ko-dos
    ko-dos over 14 years
    don't use map unless you want the return value
  • Rawler
    Rawler about 10 years
    There is another caveat with each. The iterator is bound to the hash, not the context, which means it is not re-entrant. For example if you loop over a hash, and print the hash perl will internally reset the iterator, making this code loop endlessly: my %hash = ( a => 1, b => 2, c => 3, ); while ( my ($k, $v) = each %hash ) { print %hash; } Read more at blogs.perl.org/users/rurban/2014/04/do-not-use-each.html
  • Adrian Günter
    Adrian Günter almost 7 years
    With keys memory usage increases by hash-size * avg-key-size. Given that key size is only limited by memory (as they're just array elements like "their" corresponding values under the hood), in some situations it can be prohibitively more expensive in both memory usage and time taken to make the copy.