Built in Python hash() function
Solution 1
Use hashlib as hash()
was designed to be used to:
quickly compare dictionary keys during a dictionary lookup
and therefore does not guarantee that it will be the same across Python implementations.
Solution 2
As stated in the documentation, built-in hash() function is not designed for storing resulting hashes somewhere externally. It is used to provide object's hash value, to store them in dictionaries and so on. It's also implementation-specific (GAE uses a modified version of Python). Check out:
>>> class Foo:
... pass
...
>>> a = Foo()
>>> b = Foo()
>>> hash(a), hash(b)
(-1210747828, -1210747892)
As you can see, they are different, as hash() uses object's __hash__
method instead of 'normal' hashing algorithms, such as SHA.
Given the above, the rational choice is to use the hashlib module.
Solution 3
The response is absolutely no surprise: in fact
In [1]: -5768830964305142685L & 0xffffffff
Out[1]: 1934711907L
so if you want to get reliable responses on ASCII strings, just get the lower 32 bits as uint
. The hash function for strings is 32-bit-safe and almost portable.
On the other side, you can't rely at all on getting the hash()
of any object over which you haven't explicitly defined the __hash__
method to be invariant.
Over ASCII strings it works just because the hash is calculated on the single characters forming the string, like the following:
class string:
def __hash__(self):
if not self:
return 0 # empty
value = ord(self[0]) << 7
for char in self:
value = c_mul(1000003, value) ^ ord(char)
value = value ^ len(self)
if value == -1:
value = -2
return value
where the c_mul
function is the "cyclic" multiplication (without overflow) as in C.
Solution 4
Most answers suggest this is because of different platforms, but there is more to it. From the documentation of object.__hash__(self)
:
By default, the
__hash__()
values ofstr
,bytes
anddatetime
objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n²) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.
Changing hash values affects the iteration order of
dicts
,sets
and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).
Even running on the same machine will yield varying results across invocations:
$ python -c "print(hash('http://stackoverflow.com'))"
-3455286212422042986
$ python -c "print(hash('http://stackoverflow.com'))"
-6940441840934557333
While:
$ python -c "print(hash((1,2,3)))"
2528502973977326415
$ python -c "print(hash((1,2,3)))"
2528502973977326415
See also the environment variable PYTHONHASHSEED
:
If this variable is not set or set to
random
, a random value is used to seed the hashes ofstr
,bytes
anddatetime
objects.If
PYTHONHASHSEED
is set to an integer value, it is used as a fixed seed for generating thehash()
of the types covered by the hash randomization.Its purpose is to allow repeatable hashing, such as for selftests for the interpreter itself, or to allow a cluster of python processes to share hash values.
The integer must be a decimal number in the range
[0, 4294967295]
. Specifying the value0
will disable hash randomization.
For example:
$ export PYTHONHASHSEED=0
$ python -c "print(hash('http://stackoverflow.com'))"
-5843046192888932305
$ python -c "print(hash('http://stackoverflow.com'))"
-5843046192888932305
Solution 5
Hash results varies between 32bit and 64bit platforms
If a calculated hash shall be the same on both platforms consider using
def hash32(value):
return hash(value) & 0xffffffff
Related videos on Youtube
Deniss T.
Updated on October 09, 2020Comments
-
Deniss T. over 3 years
Windows XP, Python 2.5:
hash('http://stackoverflow.com') Result: 1934711907
Google App Engine (http://shell.appspot.com/):
hash('http://stackoverflow.com') Result: -5768830964305142685
Why is that? How can I have a hash function that will give me same results across different platforms (Windows, Linux, Mac)?
-
Tzury Bar Yochay about 13 yearsthis is owe to the fact your winxp is a 32bit platform while google's is 64 bit
-
-
amcnabb over 11 yearsCan you share any context about what this hash function is used for and why?
-
Alex Huszagh over 8 yearsThis is only true for Python 3.x, but since Python 3 is the present and the future and this is the only answer that addresses this, +1.
-
jtlz2 almost 3 yearsThis is a sound way to test machine-independence - thank you!