Whats more random, hashlib or urandom?

14,669

Solution 1

This solution:

os.urandom(16).encode('hex')

is the best since it uses the OS to generate randomness which should be usable for cryptographic purposes (depends on the OS implementation).

random.random() generates pseudo-random values.

Hashing a random value does not add any new randomness.

Solution 2

random.random() is a pseudo-radmom generator, that means the numbers are generated from a sequence. if you call random.seed(some_number), then after that the generated sequence will always be the same.

os.urandom() get's the random numbers from the os' rng, which uses an entropy pool to collect real random numbers, usually by random events from hardware devices, there exist even random special entropy generators for systems where a lot of random numbers are generated.

on unix system there are traditionally two random number generators: /dev/random and /dev/urandom. calls to the first block if there is not enough entropy available, whereas when you read /dev/urandom and there is not enough entropy data available, it uses a pseudo-rng and doesn't block.

so the use depends usually on what you need: if you need a few, equally distributed random numbers, then the built in prng should be sufficient. for cryptographic use it's always better to use real random numbers.

Solution 3

The second solution clearly has more entropy than the first. Assuming the quality of the source of the random bits would be the same for os.urandom and random.random:

  • In the second solution you are fetching 16 bytes = 128 bits worth of randomness
  • In the first solution you are fetching a floating point value which has roughly 52 bits of randomness (IEEE 754 double, ignoring subnormal numbers, etc...). Then you hash it around, which, of course, doesn't add any randomness.

More importantly, the quality of the randomness coming from os.urandom is expected and documented to be much better than the randomness coming from random.random. os.urandom's docstring says "suitable for cryptographic use".

Solution 4

Testing randomness is notoriously difficult - however, I would chose the second method, but ONLY (or, only as far as comes to mind) for this case, where the hash is seeded by a random number.

The whole point of hashes is to create a number that is vastly different based on slight differences in input. For your use case, the randomness of the input should do. If, however, you wanted to hash a file and detect one eensy byte's difference, that's when a hash algorithm shines.

I'm just curious, though: why use a hash algorithm at all? It seems that you're looking for a purely random number, and there are lots of libraries that generate uuid's, which have far stronger guarantees of uniqueness than random number generators.

Solution 5

if you want a unique identifier (uuid), then you should use

import uuid
uuid.uuid4().hex

https://docs.python.org/3/library/uuid.html

Share:
14,669
Ben Keating
Author by

Ben Keating

A designer who loves to touch code and hardware.

Updated on July 29, 2022

Comments

  • Ben Keating
    Ben Keating almost 2 years

    I'm working on a project with a friend where we need to generate a random hash. Before we had time to discuss, we both came up with different approaches and because they are using different modules, I wanted to ask you all what would be better--if there is such a thing.

    hashlib.sha1(str(random.random())).hexdigest()
    

    or

    os.urandom(16).encode('hex')
    

    Typing this question out has got me thinking that the second method is better. Simple is better than complex. If you agree, how reliable is this for 'randomly' generating hashes? How would I test this?

  • Ben Keating
    Ben Keating about 12 years
    These are all really great answers. Thank you.
  • ChristopheD
    ChristopheD about 11 years
    @greengit: chances are (very high) that the small snippet above is aimed at Python 2.x version (untested in 3.x)
  • Admin
    Admin over 10 years
    This is generally true except in a few situations including freshly booted system which lacks the randomness pool (entropy) to generate a quality random number, or when the pool is depleted by a large number of calls to it (don't ask how large since I'm no expert). The latter can also be exploited by the attacker to make the randomness more predictable (e.g., creating a large number of accounts that require random salt). So unless you need a cryptographically safe random number, you should avoid using os.urandom, and also make sure it cannot be abused if you use it.
  • lonetwin
    lonetwin about 10 years
    Note that python's random.SystemRandom[1] class offers the same api interface as random.* while relying on urandom. [1] docs.python.org/2/library/random.html#random.SystemRandom
  • Zac Crites
    Zac Crites almost 9 years
    @CristopheD: try base64.b64encode(os.urandom(16))