Generate ID from string in Python
14,895
I would do something like this:
>>> import hashlib
>>> m = hashlib.md5()
>>> m.update("some string")
>>> str(int(m.hexdigest(), 16))[0:12]
'120665287271'
The idea:
- Calculate the hash of a string with MD5 (or SHA-1 or ...) in hexadecimal form (see module hashlib)
- Convert the string into an integer and reconvert it to a String with base 10 (there are just digits in the result)
- Use the first 12 characters of the string.
If characters a-f
are also okay, I would do m.hexdigest()[0:12]
.
Comments
-
mlen108 almost 2 years
I'm struggling a bit to generate ID of type
integer
for givenstring
in Python.I thought the built-it
hash
function is perfect but it appears that the IDs are too long sometimes. It's a problem since I'm limited to 64bits as maximum length.My code so far:
hash(s) % 10000000000
. The input string(s) which I can expect will be in range of 12-512 chars long.Requirements are:
- integers only
- generated from provided string
- ideally up to 10-12 chars long (I'll have ~5 million items only)
- low probability of collision..?
I would be glad if someone can provide any tips / solutions.
-
mlen108 about 10 yearsThanks, it looks great! It does not return integer but it just a matter of casting it back to int. Would be nice if we could go away with the int/str/int coerce dance. Any idea? :)
-
Stephan Kulla about 10 years
m.hexdigit()
provides a string with 32 characters. So the maximum value is'f'*32
with 39 digits (=len(str(int('f'*32,16)))
). So You can divide by 1E17 in the end. With this solution collisions might be more probably... But I did not thought it through... -
Stephan Kulla about 10 years
m.hexdigit()
providesm.digest_size * 2
characters (this might change, depending on the hash function you want to use) -
Stephan Kulla about 10 yearsNote: you can also use the string digest(), slice enough bytes from them and convert it to an integer (better to say: interpreting the byte string as an integer)