Why would I return a hash or a hash reference in Perl?
Solution 1
I prefer returning a hash ref for two reasons. One, it uses a bit less memory since there's no copy. Two, it lets you do this if you just need one piece of the hash.
my $value = build_hash()->{$key};
Learn to love hash references, you're going to be seeing them a lot once you start using objects.
Solution 2
Why not return both? Context is a very powerful feature in Perl to allow your functions to "do what you mean". Often the decision of which is a better return value depends on how the calling code plans to use the value, which is exactly why Perl has the builtin wantarray
.
sub build_hash {
my %hash;
@hash{@keys} = (1) x @keys;
wantarray ? %hash : \%hash
}
my %hash = build_hash; # list context, a list of (key => value) pairs
my $href = build_hash; # scalar context, a hash reference
Solution 3
I would return the reference to save the processing time of flattening the hash into a list of scalars, building the new hash and (possibly) garbage collecting the local hash in the subroutine.
Solution 4
What you're looking for is a hash slice:
# assigns the value 1 to every element of the hash
my %hash; # declare an empty hash
my @list = qw(hi bi no th xc ul 8e r); # declare the keys as a list
@hash{@list} = # for every key listed in @list,
(1) x @list; # ...assign to it the corresponding value in this list
# which is (1, 1, 1, 1, 1...) (@list in scalar context
# gives the number of elements in the list)
The x
operator is described at perldoc perlop.
See perldoc perldsc and perldoc perlreftut for tutorials on data structures and references (both must-reads for beginners and experts alike). Hash slices themselves are mentioned in perldoc perldata.
Regarding returning a hash from a function, normally you should return the hash itself, not a reference. You could use a reference if the hash is huge and memory or time is a concern, but that shouldn't be your first worry -- getting the code working is.
Return values from functions are always lists (where returning a scalar is essentially a list of one element). Hashes are lists in Perl: You can assign one to the other interchangably (assuming the list has an even number of elements and there are no key collisions which would result in some values being lost during the conversion):
use strict; use warnings;
use Data::Dumper;
function foo
{
return qw(key1 value1 key2 value2);
}
my @list = foo();
my %hash = foo();
print Dumper(\@list);
print Dumper(\%hash);
gives:
$VAR1 = [
'key1',
'value1',
'key2',
'value2'
];
$VAR1 = {
'key2' => 'value2',
'key1' => 'value1'
};
PS. I highly recommend writing up small sample programs like the one above to play around with data structures and to see what happens. You can learn a lot by experimenting!
Solution 5
Regarding returning a hash from a function, normally you should return the hash itself, not a reference. You could use a reference if the hash is huge and memory or time is a concern, but that shouldn't be your first worry -- getting the code working is.
I'm going to have to disagree with Ether here. There was a time when I took that position, but quickly found myself descending into a hell of having to remember which sub
s returned hashes and which returned hashrefs, which was a rather serious impediment to just getting the code working. It's important to standardize on either always returning a hash/array or always returning a hashref/arrayref unless you want to be constantly tripping over yourself.
As for which to standardize on, I see several advantages to going with references:
When you return a hash or array, what you're actually returning is a list containing a flattened copy of the original hash/array. Just like passing in hash/array parameters to a
sub
, this has the disadvantage that you can only send one list at a time. Granted, you don't often need to return multiple lists of values, but it does happen, so why choose to standardize on doing things in a way which precludes it?The (usually negligible) performance/memory benefits of returning a single scalar rather than a potentially much larger chunk of data.
It maintains consistency with OO code, which frequently passes objects (i.e., blessed references) back and forth.
If, for whatever reason, it's important that you have a fresh copy of the hash/array rather than a reference to the original, the calling code can easily make one, as the OP demonstrated in
c.pl
. If you return a copy of the hash, though, there's no way for the caller to turn that into a reference to the original. (In cases where this is advantageous, the function can make a copy and return a reference to the copy, thus protecting the original while also avoiding the "this returns hashes, that returns hashrefs" hell I mentioned earlier.)As Schwern mentioned, it's real nice to be able to do
my $foo = $obj->some_data->{key}
.
The only advantage I can see to always returning hashes/arrays is that it is easier for those who don't understand references or aren't comfortable working with them. Given that comfort with references takes a matter of weeks or months to develop, followed by years or decades of working with them fluently, I don't consider this a meaningful benefit.
user105033
Updated on June 05, 2022Comments
-
user105033 almost 2 years
What is the most effective way of accomplishing the below? (I know they accomplish the same thing, but how would most people do this between the three, and why?)
File a.pl
my %hash = build_hash(); # Do stuff with hash using $hash{$key} sub build_hash { # Build some hash my %hash = (); my @k = qw(hi bi no th xc ul 8e r); for ( @k ) { $hash{$k} = 1; } # Does this return a copy of the hash?? return %hash; }
File b.pl
my $hashref = build_hash(); # Do stuff with hash using $hashref->{$key} sub build_hash { # Build some hash my %hash = (); my @k = qw(hi bi no th xc ul 8e r); for ( @k ) { $hash{$k} = 1; } # Just return a reference (smaller than making a copy?) return \%hash; }
File c.pl
my %hash = %{build_hash()}; # Do stuff with hash using $hash{$key} # It is better, because now we don't have to dereference our hashref each time using ->? sub build_hash { # Build some hash my %hash = (); my @k = qw(hi bi no th xc ul 8e r); for ( @k ) { $hash{$k} = 1; } return \%hash; }
-
Schwern over 14 years+1 to the use of map to build a hash from a list, -1 to the idea of avoiding references.
-
ysth over 14 yearswould you like some eggs with that? (an implicit hash reference :)
-
Admin over 14 yearsIn the codebase I currently work with, that would be a simple function called
hashof
exported by the ProjectNamespace::DataManip function and approximately implemented likesub hashof { return map { $_ => 1 } @_;}
(with some prototype sugar and the like). Ourhash_slice_of($hashref, @list)
on the other hand returns each key-value pair whichexists
in $hashref where the key is also in @list. As a hash-manipulation function, they all return hashes (even-sized lists) so that the return values are easier to work with and pass to each other. -
Inshallah over 14 years+1, maintaining consistency is a good point. However, since I use arrays more often as arrays than I use hashes as hashes, and since there would be a lot more dereferencing/safe-copying going on for array refs, I think they should be treated differently.
-
Schwern over 14 years
my($href) = build_hash(); # whoops
A little too much help for not enough win. -
Eric Strom over 14 years@Schwern => If someone working in Perl can't recognize that the lvalue is imposing list context on the assignment they aren't going to get very far. Rather than hiding these features from people, those who understand them should help others use them properly.
-
Schwern over 14 years@Eric Its not about not understanding contexts. Its whether you expect build_hash() to return something different in list context. Don't trust that the user studies and forever remembers the documentation of every function. Its compounded in that
my($foo)
andmy $foo
are fairly easy to casually interchange. Alsofunction( build_hash() )
. Oops. Subtle bugs easily missed. Finally, retuning a hash doesn't have the utility of returning a list. A hash must be stuck into a variable to be useful. Returning a list can be used implicitly, LISP style. So I question the value.