Deterministic hashing in Python 3
Forcing Python's built-in hash
to be deterministic is intrinsically hacky. If you want to avoid hackitude, use a different hashing function -- see e.g in Python-2: https://docs.python.org/2/library/hashlib.html,
and in Python-3: https://docs.python.org/3/library/hashlib.html
Jimmy C
Student of computational lingustics at Uppsala University in Sweden.
Updated on June 15, 2022Comments
-
Jimmy C about 2 years
I'm using hashing of strings for seeding random states in the following way:
context = "string" seed = hash(context) % 4294967295 # This is necessary to keep the hash within allowed seed values np.random.seed(seed)
This is unfortunately (for my usage) non-deterministic between runs in Python 3.3 and up. I do know that I could set the
PYTHONHASHSEED
environment variable to an integer value to regain the determinism, but I would probably prefer something that feels a bit less hacky, and won't entirely disregard the extra security added by random hashing. Suggestions? -
nicolas over 4 yearsIsn't a hash supposed to be deterministic ?
-
Le Frite over 4 yearshash() is only deterministic throughout the same run, you have no guarantee it will return the same hash in different runs. Hence it's bad for persistence on disk.