Why would I return a hash or a hash reference in Perl?

16,021

Solution 1

I prefer returning a hash ref for two reasons. One, it uses a bit less memory since there's no copy. Two, it lets you do this if you just need one piece of the hash.

my $value = build_hash()->{$key};

Learn to love hash references, you're going to be seeing them a lot once you start using objects.

Solution 2

Why not return both? Context is a very powerful feature in Perl to allow your functions to "do what you mean". Often the decision of which is a better return value depends on how the calling code plans to use the value, which is exactly why Perl has the builtin wantarray.

sub build_hash {
    my %hash;
    @hash{@keys} = (1) x @keys;
    wantarray ? %hash : \%hash
}

my %hash = build_hash;  # list context, a list of (key => value) pairs
my $href = build_hash;  # scalar context, a hash reference

Solution 3

I would return the reference to save the processing time of flattening the hash into a list of scalars, building the new hash and (possibly) garbage collecting the local hash in the subroutine.

Solution 4

What you're looking for is a hash slice:

# assigns the value 1 to every element of the hash

my %hash;                                   # declare an empty hash
my @list = qw(hi bi no th xc ul 8e r);      # declare the keys as a list
@hash{@list} =                              # for every key listed in @list,
                (1) x @list;                # ...assign to it the corresponding value in this list
                                            # which is (1, 1, 1, 1, 1...)  (@list in scalar context
                                            #   gives the number of elements in the list)

The x operator is described at perldoc perlop.

See perldoc perldsc and perldoc perlreftut for tutorials on data structures and references (both must-reads for beginners and experts alike). Hash slices themselves are mentioned in perldoc perldata.

Regarding returning a hash from a function, normally you should return the hash itself, not a reference. You could use a reference if the hash is huge and memory or time is a concern, but that shouldn't be your first worry -- getting the code working is.

Return values from functions are always lists (where returning a scalar is essentially a list of one element). Hashes are lists in Perl: You can assign one to the other interchangably (assuming the list has an even number of elements and there are no key collisions which would result in some values being lost during the conversion):

use strict; use warnings;
use Data::Dumper;

function foo
{
    return qw(key1 value1 key2 value2);
}

my @list = foo();
my %hash = foo();

print Dumper(\@list);
print Dumper(\%hash);

gives:

$VAR1 = [
          'key1',
          'value1',
          'key2',
          'value2'
        ];

$VAR1 = {
          'key2' => 'value2',
          'key1' => 'value1'
        };

PS. I highly recommend writing up small sample programs like the one above to play around with data structures and to see what happens. You can learn a lot by experimenting!

Solution 5

Regarding returning a hash from a function, normally you should return the hash itself, not a reference. You could use a reference if the hash is huge and memory or time is a concern, but that shouldn't be your first worry -- getting the code working is.

I'm going to have to disagree with Ether here. There was a time when I took that position, but quickly found myself descending into a hell of having to remember which subs returned hashes and which returned hashrefs, which was a rather serious impediment to just getting the code working. It's important to standardize on either always returning a hash/array or always returning a hashref/arrayref unless you want to be constantly tripping over yourself.

As for which to standardize on, I see several advantages to going with references:

  • When you return a hash or array, what you're actually returning is a list containing a flattened copy of the original hash/array. Just like passing in hash/array parameters to a sub, this has the disadvantage that you can only send one list at a time. Granted, you don't often need to return multiple lists of values, but it does happen, so why choose to standardize on doing things in a way which precludes it?

  • The (usually negligible) performance/memory benefits of returning a single scalar rather than a potentially much larger chunk of data.

  • It maintains consistency with OO code, which frequently passes objects (i.e., blessed references) back and forth.

  • If, for whatever reason, it's important that you have a fresh copy of the hash/array rather than a reference to the original, the calling code can easily make one, as the OP demonstrated in c.pl. If you return a copy of the hash, though, there's no way for the caller to turn that into a reference to the original. (In cases where this is advantageous, the function can make a copy and return a reference to the copy, thus protecting the original while also avoiding the "this returns hashes, that returns hashrefs" hell I mentioned earlier.)

  • As Schwern mentioned, it's real nice to be able to do my $foo = $obj->some_data->{key}.

The only advantage I can see to always returning hashes/arrays is that it is easier for those who don't understand references or aren't comfortable working with them. Given that comfort with references takes a matter of weeks or months to develop, followed by years or decades of working with them fluently, I don't consider this a meaningful benefit.

Share:
16,021
user105033
Author by

user105033

Updated on June 05, 2022

Comments

  • user105033
    user105033 almost 2 years

    What is the most effective way of accomplishing the below? (I know they accomplish the same thing, but how would most people do this between the three, and why?)

    File a.pl

    my %hash = build_hash();
    # Do stuff with hash using $hash{$key}
    sub build_hash
    {
        # Build some hash
        my %hash = ();
        my @k = qw(hi bi no th xc ul 8e r);
        for ( @k )
        {
            $hash{$k} = 1;
        }
    
        # Does this return a copy of the hash??
        return %hash;
    }
    

    File b.pl

    my $hashref = build_hash();
    # Do stuff with hash using $hashref->{$key}
    sub build_hash
    {
        # Build some hash
        my %hash = ();
        my @k = qw(hi bi no th xc ul 8e r);
        for ( @k )
        {
            $hash{$k} = 1;
        }
    
        # Just return a reference (smaller than making a copy?)
        return \%hash;
    }
    

    File c.pl

    my %hash = %{build_hash()};
    # Do stuff with hash using $hash{$key}
    # It is better, because now we don't have to dereference our hashref each time using ->?
    
    sub build_hash
    {
        # Build some hash
        my %hash = ();
        my @k = qw(hi bi no th xc ul 8e r);
        for ( @k )
        {
            $hash{$k} = 1;
        }
    
        return \%hash;
    }
    
  • Schwern
    Schwern over 14 years
    +1 to the use of map to build a hash from a list, -1 to the idea of avoiding references.
  • ysth
    ysth over 14 years
    would you like some eggs with that? (an implicit hash reference :)
  • Admin
    Admin over 14 years
    In the codebase I currently work with, that would be a simple function called hashof exported by the ProjectNamespace::DataManip function and approximately implemented like sub hashof { return map { $_ => 1 } @_;} (with some prototype sugar and the like). Our hash_slice_of($hashref, @list) on the other hand returns each key-value pair which exists in $hashref where the key is also in @list. As a hash-manipulation function, they all return hashes (even-sized lists) so that the return values are easier to work with and pass to each other.
  • Inshallah
    Inshallah over 14 years
    +1, maintaining consistency is a good point. However, since I use arrays more often as arrays than I use hashes as hashes, and since there would be a lot more dereferencing/safe-copying going on for array refs, I think they should be treated differently.
  • Schwern
    Schwern over 14 years
    my($href) = build_hash(); # whoops A little too much help for not enough win.
  • Eric Strom
    Eric Strom over 14 years
    @Schwern => If someone working in Perl can't recognize that the lvalue is imposing list context on the assignment they aren't going to get very far. Rather than hiding these features from people, those who understand them should help others use them properly.
  • Schwern
    Schwern over 14 years
    @Eric Its not about not understanding contexts. Its whether you expect build_hash() to return something different in list context. Don't trust that the user studies and forever remembers the documentation of every function. Its compounded in that my($foo) and my $foo are fairly easy to casually interchange. Also function( build_hash() ). Oops. Subtle bugs easily missed. Finally, retuning a hash doesn't have the utility of returning a list. A hash must be stuck into a variable to be useful. Returning a list can be used implicitly, LISP style. So I question the value.