How to hash a string into 8 digits?
Solution 1
Yes, you can use the built-in hashlib
module or the built-in hash
function. Then, chop-off the last eight digits using modulo operations or string slicing operations on the integer form of the hash:
>>> s = 'she sells sea shells by the sea shore'
>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s.encode("utf-8")).hexdigest(), 16) % (10 ** 8)
58097614L
>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974
Solution 2
Raymond's answer is great for python2 (though, you don't need the abs() nor the parens around 10 ** 8). However, for python3, there are important caveats. First, you'll need to make sure you are passing an encoded string. These days, in most circumstances, it's probably also better to shy away from sha-1 and use something like sha-256, instead. So, the hashlib approach would be:
>>> import hashlib
>>> s = 'your string'
>>> int(hashlib.sha256(s.encode('utf-8')).hexdigest(), 16) % 10**8
80262417
If you want to use the hash() function instead, the important caveat is that, unlike in Python 2.x, in Python 3.x, the result of hash() will only be consistent within a process, not across python invocations. See here:
$ python -V
Python 2.7.5
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python3 -V
Python 3.4.2
$ python3 -c 'print(hash("foo"))'
5790391865899772265
$ python3 -c 'print(hash("foo"))'
-8152690834165248934
This means the hash()-based solution suggested, which can be shortened to just:
hash(s) % 10**8
will only return the same value within a given script run:
#Python 2:
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
#Python 3:
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
12954124
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
32065451
So, depending on if this matters in your application (it did in mine), you'll probably want to stick to the hashlib-based approach.
Solution 3
Just to complete JJC answer, in python 3.5.3 the behavior is correct if you use hashlib this way:
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -V
Python 3.5.3
Comments
-
Bob Fang over 3 years
Is there anyway that I can hash a random string into a 8 digit number without implementing any algorithms myself?
-
Theran about 11 yearshash("your string") % 100000000
-
DhruvPathak about 11 years8 digit seems to small, and may result in collisions of hashes if you have large number of records. stackoverflow.com/questions/1303021/…
-
architectonic over 7 yearsUse hashlib since hash has another purpose!
-
Alex North-Keys about 7 yearsAny finite number of digits will result in collisions for sufficiently large numbers of hash items, that's why you shouldn't treat them as unique keys - it tends to turn into the birthday problem.
-
tryptofame almost 7 yearsI've chosen "CityHash" to hash strings to 19 digit long integers (64bit integers), hoping this will lead to less potential collisions than Raymond's suggestion below. en.wikipedia.org/wiki/List_of_hash_functions
-
-
twneale almost 9 yearspublic service announcement...this technique doesn't actually result in a unique hash value for the string; it computes a hash and then munges into a non-guaranteed-unique value
-
Raymond Hettinger almost 9 yearspublic service announcement...except for the special case of perfect hashes over limited set of input values, hash functions aren't supposed to generate guaranteed unique values.
-
twneale almost 9 yearsProbably true, but virtually all of their practical utility derives from their their good-enough tendency to produce unique values. The probability of a 'hash' collision using this trick is probably 10 or 11 orders of magnitude higher than md5
-
Raymond Hettinger almost 9 yearsDid you read the OP's question? He (or she) wanted (or needed) 8 decimal places. Also, the way hash tables work is to hash into a small search space (the sparse table). You seem to not know want hash functions are commonly used for and to not care about the actual question that was asked.
-
twneale almost 9 yearsI read the question. I'm simply observing that over the same input space as SHA-1, your answer is astronomically more likely to produce a collision than not. At least some degree of uniqueness is implicitly required by the question, but your answer is a hash function in the same spirit as one that simply returns 12345678 for every input. I was able to experimentally generate a collision with as few as 1000 inputs using this method. To preserve the same collision probability as SHA-1, you would have to map un-truncated SHA-1's to 8-digit integers. I think that's worthy of a PSA
-
Mr. Napik over 8 yearsCareful, hash(s) is not guarateed to give same results across platforms and runs.
-
Doug about 8 yearsIs
abs
needed? Modulo should return a positive int. -
Raymond Hettinger about 8 yearsIn Python2,
hash('agir')
is-2835743962885600615
. -
JJC over 7 yearsRight, but what I think Doug meant is that even a negative number mod something will always produce a positive number, so it seems you can drop the abs(). Also, I think the relative operator precedence of exponentiation means we don't even need the second parens. Thanks for the answer, though! >>> hash(s) % 10**8 produces 57227199
-
JJC over 7 yearsAn important caveat is that, unlike with Python 2.x, hash(x) returns a different value on each Python 3.x interpreter invocation (it is consistent within a single process). So, if the OP is depending on the hash to be the same for a given string across script runs, the latter will not work in Python 3.x. This just bit me. I will add an answer to reflect these two comments (not yet sure about etiquette of editing).
-
Wolph over 6 yearsIt should be noted that this answer has a very important caveat since Python 3.3, to protect against tar-pitting Python 3.3 and above use a random hash seed upon startup.
-
silgon over 5 yearsShould use
1e8
instead of10**8
you're performing an extra computation when there is absolutely no need. Also, nice answer, it's exactly what I was looking for. -
Raymond Hettinger over 5 years@silgon Python's peephole optimizer does constant folding, so the computation is only done once. That is easy to verify. Run
dis(compile('10 ** 8', '', 'eval'))
and look for the fragmentLOAD_CONST 0 (100000000)
. Alternatively, rundef f(): return 10**8
and observe thatf.__code__.co_consts
returns(None, 100000000)
. Notes that10E8
isn't a valid substitute because that is a float rather than an int. -
silgon over 5 yearsWow... I just checked what you said, you're right, and it's really interesting, I thought that the power operation
**
would always run an operation, however as you said, it'sLOAD_CONST
. Thanks for the interesting reply. Also, you're right, the scientific notation1e8
gives a float. -
lony over 5 yearsIf digits are not your main requirement you could also use
hashlib.sha256("hello world".encode('utf-8')).hexdigest()[:8]
witch still will have collisions -
user1783732 almost 5 yearsSome of the comments regarding 'unique value' are confusing. I am trying to do same thing, and tested in Python 3.7.4 and 3.5.3 on two different machines. For the same input string, the result are the same. Is it true that the same input string always results in the same output for
hashlib.sha1
? -
Tomasz over 4 yearsThey should put that on the box!
-
Harabeck over 4 yearsYou're sharing a nodejs solution in a question about python?
-
user 923227 over 4 yearsYes, when we were building the system - the backend processed this using python while the frontend used node.js. Needed to make sure both work seamlessly.
-
ingo about 4 yearsIf your extracted 8 digits start with a 0, you'll end up with a 7 digit number.
-
kaptan almost 3 yearsI think it is worth mentioning that if you want a stable hash you you should use the
hashlib
solution.