Disable hash randomization from within python program

21,668

Solution 1

I suspect this isn't possible, unfortunately. Looking at test_hash.py the HashRandomizationTests class and its descendants were added in the commit that introduced this behavior. They test the hashing behavior by modifying the environment and starting a new process with PYTHONHASHSEED explicitly set. You could try to copy that pattern, perhaps.

I also just noticed you said "Every time I run my script, dict contents are iterated in a different order." - I assume you're aware of collections.OrderedDict, right? That's the normal way to get reliable hash iteration.


If you're willing to set the value in your shell environment, you could also just wrap your python call in a bash script, e.g.

#! /bin/bash
export PYTHONHASHSEED=0

# call your python program here

That avoids needing to manipulate your whole environment, as long as you're ok with a wrapper script.

Or even just pass the value on the command line:

$ PYTHONHASHSEED=0 python YOURSCRIPT.py

Solution 2

Apart from dictionary order, hash randomisation may also break existing code that uses hash() directly. A workaround that solved the problem for me in this case was to replace

hash(mystring)

with

int(hashlib.sha512(mystring).hexdigest(), 16)

For Python 3, a conversion like mystring.encode('utf-8') will be needed for standard strings. (I was working with byte strings.)

Note that the range of numbers and whether negative numbers are included are different. The latter code gives a much bigger range of numbers and hash collisions are extremely unlikely.

To reproduce the same 64-bit range as hash(), one could reduce the number of hexadecimal digits to 16 (4 bits per digit) and shift the result to start at the smallest negative 64-bit number:

int(hashlib.sha256(mystring).hexdigest()[:16], 16)-2**63

Alternatively, one can take 8 bytes and use int.from_bytes:

int.from_bytes(hashlib.sha256(mystring).digest()[:8], byteorder='big', signed=True)

Solution 3

Maybe the only/cleanest way is to prepend this to the beginning of your program:

import os
import sys
hashseed = os.getenv('PYTHONHASHSEED')
if not hashseed:
    os.environ['PYTHONHASHSEED'] = '0'
    os.execv(sys.executable, [sys.executable] + sys.argv)

[the rest of your program]

If PYTHONHASHSEED is missing, it will set it to zero and replace the current program with a new, supplying the same set of arguments. According to os.execv:

These functions all execute a new program, replacing the current process; they do not return. On Unix, the new executable is loaded into the current process, and will have the same process id as the caller. Errors will be reported as OSError exceptions.

The current process is replaced immediately. Open file objects and descriptors are not flushed, so if there may be data buffered on these open files, you should flush them using sys.stdout.flush() or os.fsync() before calling an exec* function.

Share:
21,668

Related videos on Youtube

alexis
Author by

alexis

Updated on July 09, 2022

Comments

  • alexis
    alexis almost 2 years

    Starting with Python 3.3, the hashing algorithm is non-deterministically salted to avoid a certain kind of attack. This is nice for webservers but it's a pain when trying to debug a program: Every time I run my script, dict contents are iterated in a different order.

    Some earlier versions of python had a -R flag for enabling hash randomization, but now that it's the default behavior, the flag has not been replaced by its opposite. Randomization can be disabled by setting the environment variable PYTHONHASHSEED:

    PYTHONHASHSEED

    If this variable is not set or set to random, a random value is used to seed the hashes of str, bytes and datetime objects.
    If PYTHONHASHSEED is set to an integer value, it is used as a fixed seed for generating the hash() of the types covered by the hash randomization.

    The catch is that this variable must be set before launching the python process. I've tried to set it with os.putenv(), or in os.environ, but these seem to have no effect on the hashing method. This is not too surprising: I wouldn't expect python to check the environment before every single set or dictionary lookup! So, the question remains:

    Is there a way for a python program to disable its own hash randomization?

    • user2357112
      user2357112 about 9 years
      It would have to happen before any actual Python code is executed; by that time, far too many strings have been hashed and placed into things like type and module __dict__s.
    • alexis
      alexis about 9 years
      I thought the same; but I hope someone who knows more can comment.
    • Ar5hv1r
      Ar5hv1r about 9 years
      These are the relevant commits: f4b7ecf8a5f8 and 6b7704fe1be1 - glancing through it I don't immediately see a solution, but it's a starting point.
    • Chris
      Chris almost 3 years
      This randomization of plain old hashing is really not a feature. There should be a completely dependable, non-secure, decent built-in hasher that can be used to create identical identifiers between processes and sessions. Python3's hasher more or less fails on all counts. Security mission creep will kill Python (Perl died, and Python can too) if we let it; Python3 is not a secure language any more than an office or a kitchen is a secure part of a building.
  • alexis
    alexis about 9 years
    Thanks, that's a pretty strong indication. And a good tip for respawning-- though beyond the ugliness, there are contexts where it's not practical (e.g., if running in an ipython notebook served by a remote "kernel"). I guess I can just set it on log-in for my environment... I'm not going to DoS myself.
  • Joachim Wagner
    Joachim Wagner almost 5 years
    For some applications it is worth noting that integers are passed through by hash().
  • Ar5hv1r
    Ar5hv1r almost 5 years
    @JoachimWagner I don't believe that will affect hash-randomization, which this question is asking about.
  • Ar5hv1r
    Ar5hv1r over 4 years
    @JoachimWagner consider posting an answer to elaborate, but I don't think this is correct. random.seed() doesn't affect dict's hashing behavior, which is what the question is asking about.
  • Joachim Wagner
    Joachim Wagner over 4 years
    I deleted my random.seed() comment as it doesn't use hash() since 3.2. Thanks @dimo414 for discussion and suggestion to post a detailed answer.
  • Jason Angel
    Jason Angel almost 4 years
    @dimo414, Im using jupyterLab/notebook, do you have any suggestion for achieving change PYTHONHASHSEED without manipulate the whole environment using notebook cells? currently im doing os.environ['PYTHONHASHSEED'] = str(seed) which I think afect the full env
  • Ar5hv1r
    Ar5hv1r almost 4 years
    If that's working for you it's probably the best you can do, but consider posting a separate question. You have to manipulate the whole environment for some process, but maybe the notebook runs as a subprocess and you can configure its environment separately.