Hashing an array or object in python 3
Solution 1
You can use the repr()
function to get the (Unicode) string representation of the array (or of whatever object that implements conversion to a representation). Then you encode the string to UTF-8 (the order of bytes is the same everywhere when using UTF-8). The resulting bytes can be hashed as you tried above:
#!python3
import hashlib
def hashFor(data):
# Prepare the project id hash
hashId = hashlib.md5()
hashId.update(repr(data).encode('utf-8'))
return hashId.hexdigest()
if __name__ == '__main__':
data1 = ['abc', 'de']
data2 = ['a', 'bcde']
print(hashFor(data1) + ':', data1)
print(hashFor(data2) + ':', data2)
It prints on my console:
c:\tmp\___python\skerit\so17412304>py a.py
d26d27d8cbb7c6fe50637155c21d5af6: ['abc', 'de']
dbd5ab5df464b8bcee61fe8357f07b6e: ['a', 'bcde']
Solution 2
Depending on what you want to do, getting the hash of all strings concatenated or hash of each string separately. you can get the fist following Thomas solution as m.update(a); m.update(b) is equivalent to m.update(a+b). Or the later following below solution
def generateHash(data):
# Prepare the project id hash
return [hashlib.md5(i.encode('utf-8')).hexdigest() for i in data]
Note that it returns a list. Each element is hash of a corresponding element in the given string list
Solution 3
If you'd like to hash a list of strings, a naive solution could be:
def hash_string_list(string_list):
h = hashlib.md5()
for s in string_list: # Note that you could use ''.join(string_list) instead
h.update(s) # s.encode('utf-8') if you're using Python 3
return h.hexdigest()
However, be wary that ['abc', 'efg']
and ['a', 'bcefg']
would hash to the same value.
If you provide more context regarding your objective, other solutions might be more appropriate.
Related videos on Youtube
Jelle De Loecker
Updated on June 05, 2022Comments
-
Jelle De Loecker about 2 years
I want to hash a simple array of strings The documentation says you can't simple feed a string into hashlib's update() function, so I tried a regular variable, but then I got the
TypeError: object supporting the buffer API required
error.Here's what I had so far
def generateHash(data): # Prepare the project id hash hashId = hashlib.md5() hashId.update(data) return hashId.hexdigest()
-
mata almost 11 years...except that
update
takes bytes, so if you have strings you need to encode them first. -
Thomas Orozco almost 11 years@mata Oh, didn't realize that was a python3 question - sorry.
-
Eric Seppanen almost 8 yearsThere's no guarantee that an arbitrary object's
__repr__
returns something that's a useful input for the hash function. hashlib objects themselves, for example, repr() to '<md5 HASH object @ 0x7fb503555a80>'. Even ignoring the nasty implications if this were used for some cryptographic operation, this isn't even deterministic! The same program run at different times won't return the same hash value. -
pepr almost 8 years@EricSeppanen: The answer is related to array of strings. You are right. One should not use hammer for every work.
-
Johann Bauer over 7 yearsYou should still try to use something like
",".join(data)
as__repr__
isn't guaranteed to be consistent with future versions. Maybe Python 4 will return a slightly different string (e.g.s'abc
instead of'abc'
). -
pepr about 7 years@JohannBauer: It is unlikely. Anyway, the question is rather old, and the situation may have changed. The
','.join(data)
is buggy as['a,', 'bb']
would produce the same result as['a', ',bb']
. But you are right. Any suitable function that captures the representation of the array of strings and returns it as bytes can be used for calculating the hash value. -
pepr about 5 years@EricJin: Yes, but the goal was to have the string representation of a list of strings.