Deterministic hashing in Python 3

11,593

Forcing Python's built-in hash to be deterministic is intrinsically hacky. If you want to avoid hackitude, use a different hashing function -- see e.g in Python-2: https://docs.python.org/2/library/hashlib.html, and in Python-3: https://docs.python.org/3/library/hashlib.html

Share:
11,593
Jimmy C
Author by

Jimmy C

Student of computational lingustics at Uppsala University in Sweden.

Updated on June 15, 2022

Comments

  • Jimmy C
    Jimmy C about 2 years

    I'm using hashing of strings for seeding random states in the following way:

    context = "string"
    seed = hash(context) % 4294967295 # This is necessary to keep the hash within allowed seed values
    np.random.seed(seed)
    

    This is unfortunately (for my usage) non-deterministic between runs in Python 3.3 and up. I do know that I could set the PYTHONHASHSEED environment variable to an integer value to regain the determinism, but I would probably prefer something that feels a bit less hacky, and won't entirely disregard the extra security added by random hashing. Suggestions?

  • nicolas
    nicolas over 4 years
    Isn't a hash supposed to be deterministic ?
  • Le Frite
    Le Frite over 4 years
    hash() is only deterministic throughout the same run, you have no guarantee it will return the same hash in different runs. Hence it's bad for persistence on disk.