How can I check if a key exists in a deep Perl hash?

perl hash autovivification

13,490

Solution 1

It's much better to use something like the autovivification module to turn off that feature, or to use Data::Diver. However, this is one of the simple tasks that I'd expect a programmer to know how to do on his own. Even if you don't use this technique here, you should know it for other problems. This is essentially what Data::Diver is doing once you strip away its interface.

This is easy once you get the trick of walking a data structure (if you don't want to use a module that does it for you). In my example, I create a check_hash subroutine that takes a hash reference and an array reference of keys to check. It checks one level at a time. If the key is not there, it returns nothing. If the key is there, it prunes the hash to just that part of the path and tries again with the next key. The trick is that $hash is always the next part of the tree to check. I put the exists in an eval in case the next level isn't a hash reference. The trick is not to fail if the hash value at the end of the path is some sort of false value. Here's the important part of the task:

sub check_hash {
   my( $hash, $keys ) = @_;

   return unless @$keys;

   foreach my $key ( @$keys ) {
       return unless eval { exists $hash->{$key} };
       $hash = $hash->{$key};
       }

   return 1;
   }

Don't be scared by all the code in the next bit. The important part is just the check_hash subroutine. Everything else is testing and demonstration:

#!perl
use strict;
use warnings;
use 5.010;

sub check_hash {
   my( $hash, $keys ) = @_;

   return unless @$keys;

   foreach my $key ( @$keys ) {
       return unless eval { exists $hash->{$key} };
       $hash = $hash->{$key};
       }

   return 1;
   }

my %hash = (
   a => {
       b => {
           c => {
               d => {
                   e => {
                       f => 'foo!',
                       },
                   f => 'foo!',
                   },
               },
           f => 'foo!',
           g => 'goo!',
           h => 0,
           },
       f => [ qw( foo goo moo ) ],
       g => undef,
       },
   f => sub { 'foo!' },
   );

my @paths = (
   [ qw( a b c d     ) ], # true
   [ qw( a b c d e f ) ], # true
   [ qw( b c d )       ], # false
   [ qw( f b c )       ], # false
   [ qw( a f )         ], # true
   [ qw( a f g )       ], # false
   [ qw( a g )         ], # true
   [ qw( a b h )       ], # false
   [ qw( a )           ], # true
   [ qw( )             ], # false
   );

say Dumper( \%hash ); use Data::Dumper; # just to remember the structure    
foreach my $path ( @paths ) {
   printf "%-12s --> %s\n", 
       join( ".", @$path ),
       check_hash( \%hash, $path ) ? 'true' : 'false';
   }

Here's the output (minus the data dump):

a.b.c.d      --> true
a.b.c.d.e.f  --> true
b.c.d        --> false
f.b.c        --> false
a.f          --> true
a.f.g        --> false
a.g          --> true
a.b.h        --> true
a            --> true
             --> false

Now, you might want to have some other check instead of exists. Maybe you want to check that the value at the chosen path is true, or a string, or another hash reference, or whatever. That's just a matter of supplying the right check once you have verified that the path exists. In this example, I pass a subroutine reference that will check the value I left off with. I can check for anything I like:

#!perl
use strict;
use warnings;
use 5.010;

sub check_hash {
    my( $hash, $sub, $keys ) = @_;

    return unless @$keys;

    foreach my $key ( @$keys ) {
        return unless eval { exists $hash->{$key} };
        $hash = $hash->{$key};
        }

    return $sub->( $hash );
    }

my %hash = (
    a => {
        b => {
            c => {
                d => {
                    e => {
                        f => 'foo!',
                        },
                    f => 'foo!',
                    },
                },
            f => 'foo!',
            g => 'goo!',
            h => 0,
            },
        f => [ qw( foo goo moo ) ],
        g => undef,
        },
    f => sub { 'foo!' },
    );

my %subs = (
    hash_ref  => sub {   ref $_[0] eq   ref {}  },
    array_ref => sub {   ref $_[0] eq   ref []  },
    true      => sub { ! ref $_[0] &&   $_[0]   },
    false     => sub { ! ref $_[0] && ! $_[0]   },
    exist     => sub { 1 },
    foo       => sub { $_[0] eq 'foo!' },
    'undef'   => sub { ! defined $_[0] },
    );

my @paths = (
    [ exist     => qw( a b c d     ) ], # true
    [ hash_ref  => qw( a b c d     ) ], # true
    [ foo       => qw( a b c d     ) ], # false
    [ foo       => qw( a b c d e f ) ], # true
    [ exist     => qw( b c d )       ], # false
    [ exist     => qw( f b c )       ], # false
    [ array_ref => qw( a f )         ], # true
    [ exist     => qw( a f g )       ], # false
    [ 'undef'   => qw( a g )         ], # true
    [ exist     => qw( a b h )       ], # false
    [ hash_ref  => qw( a )           ], # true
    [ exist     => qw( )             ], # false
    );

say Dumper( \%hash ); use Data::Dumper; # just to remember the structure    
foreach my $path ( @paths ) {
    my $sub_name = shift @$path;
    my $sub = $subs{$sub_name};
    printf "%10s --> %-12s --> %s\n", 
        $sub_name, 
        join( ".", @$path ),
        check_hash( \%hash, $sub, $path ) ? 'true' : 'false';
    }

And its output:

     exist --> a.b.c.d      --> true
  hash_ref --> a.b.c.d      --> true
       foo --> a.b.c.d      --> false
       foo --> a.b.c.d.e.f  --> true
     exist --> b.c.d        --> false
     exist --> f.b.c        --> false
 array_ref --> a.f          --> true
     exist --> a.f.g        --> false
     undef --> a.g          --> true
     exist --> a.b.h        --> true
  hash_ref --> a            --> true
     exist -->              --> false

Solution 2

You could use the autovivification pragma to deactivate the automatic creation of references:

use strict;
use warnings;
no autovivification;

my %foo;
print "yes\n" if exists $foo{bar}{baz}{quux};

print join ', ', keys %foo;

It's also lexical, meaning it'll only deactivate it inside the scope you specify it in.

Solution 3

Check every level for existence before looking at the top level.

if (exists $ref->{A} and exists $ref->{A}{B} and exists $ref->{A}{B}{$key}) {
}

If you find that annoying you could always look on CPAN. For instance, there is Hash::NoVivify.

Solution 4

Take a look at Data::Diver. E.g.:

use Data::Diver qw(Dive);

my $ref = { A => { foo => "bar" } };
my $value1 = Dive($ref, qw(A B), $key);
my $value2 = Dive($ref, qw(A foo));

View more solutions

13,490

Author by

David B

Updated on June 15, 2022

Comments

David B about 2 years

If I understand correctly, calling if (exists $ref->{A}->{B}->{$key}) { ... } will spring into existence $ref->{A} and $ref->{A}->{B} even if they did not exist prior to the if!

This seems highly unwanted. So how should I check if a "deep" hash key exists?
- brian d foy almost 14 years
  
  I'm amazed that this isn't in the perlfaq, considering it's more FA than most of the Qs already in there. Give me a couple of minutes and I'll fix that :)
- brian d foy almost 14 years
  
  Oh look, there it is in perlfaq4: How can I check if a key exists in a multilevel hash?. It's essentially a summary of this thread. Thanks StackOverflow :)
- Richlv over 2 years
  
  Section in the link either got trimmed or changed - the link now is perldoc.perl.org/…? .
David B almost 14 years

also, is there a difference between $ref->{A}{B}{C} and $ref->{A}->{B}->{C}?
hobbs almost 14 years

@David No, there's no difference. The only arrow that does anything is the first. Arrows between successive {} and [] are unnecessary and it usually looks better to leave them out.
David B almost 14 years

Can't locate autovivification.pm in @INC?!
phaylon almost 14 years

@David: Autovivification was always there. This module simply gives you control over it.
ysth almost 14 years

blech; use &&; and for flow control only
ysth almost 14 years

@David B: no difference, see perlmonks.org/?node=References+quick+reference
Chas. Owens almost 14 years

@ysth blech right back at you. I prefer the low precedence operators.
Chas. Owens almost 14 years

That is an abomination. I go cross-eyed just trying to look at it. You are also creating up to n - 1(where n is the number of levels in the hash) anonymous hashrefs for the sole purpose of avoiding autovivication in the target hash (you autovivify in the anonymous hashref instead). I wonder what the performance is like compared to the multiple calls to exist of the sane code.
ysth almost 14 years

@Chas. Owens: the performance is probably worse, maybe many times worse, which matters not at all given that it takes a trivial amount of time.
ysth almost 14 years

@Chas. Owens: I'm sure you are very careful and never have precedence errors because of that, but IMO such examples for other people should use &&.
Chas. Owens almost 14 years

It is actually better in the case were all of the keys exist by about three times. The sane version starts win after that, but they all can execute over a million times a second, so there is no real benefit either way. Here is the benchmark I used.
CanSpice almost 14 years

If you're really concerned about precedence, wrap things in parentheses.
ysth almost 14 years

@Chas. Owens: that's what I said :) but your sane code doesn't protect $ref itself from being autovivified.
brian d foy almost 14 years

The problem with this approach is that you have to recode it for every depth.
brian d foy almost 14 years

The problem with this approach is that you have to recode the same thing for every set of keys and depth. There's no reusability here.
vol7ron almost 14 years

&& over and in logical comparison. and over && in flow control.
runrig almost 14 years

@David: You have autovivification. You don't have "no autovivification" until you install autovivification :)