Hashing an array or object in python 3

15,990

Solution 1

You can use the repr() function to get the (Unicode) string representation of the array (or of whatever object that implements conversion to a representation). Then you encode the string to UTF-8 (the order of bytes is the same everywhere when using UTF-8). The resulting bytes can be hashed as you tried above:

#!python3
import hashlib

def hashFor(data):
    # Prepare the project id hash
    hashId = hashlib.md5()

    hashId.update(repr(data).encode('utf-8'))

    return hashId.hexdigest()


if __name__ == '__main__':
    data1 = ['abc', 'de']
    data2 = ['a', 'bcde']
    print(hashFor(data1) + ':', data1)
    print(hashFor(data2) + ':', data2)

It prints on my console:

c:\tmp\___python\skerit\so17412304>py a.py
d26d27d8cbb7c6fe50637155c21d5af6: ['abc', 'de']
dbd5ab5df464b8bcee61fe8357f07b6e: ['a', 'bcde']

Solution 2

Depending on what you want to do, getting the hash of all strings concatenated or hash of each string separately. you can get the fist following Thomas solution as m.update(a); m.update(b) is equivalent to m.update(a+b). Or the later following below solution

def generateHash(data):
    # Prepare the project id hash

    return [hashlib.md5(i.encode('utf-8')).hexdigest() for i in data]

Note that it returns a list. Each element is hash of a corresponding element in the given string list

Solution 3

If you'd like to hash a list of strings, a naive solution could be:

def hash_string_list(string_list):
    h = hashlib.md5()
    for s in string_list: # Note that you could use ''.join(string_list) instead
        h.update(s)       # s.encode('utf-8') if you're using Python 3
    return h.hexdigest()

However, be wary that ['abc', 'efg'] and ['a', 'bcefg'] would hash to the same value.

If you provide more context regarding your objective, other solutions might be more appropriate.

Share:
15,990

Related videos on Youtube

Jelle De Loecker
Author by

Jelle De Loecker

Updated on June 05, 2022

Comments

  • Jelle De Loecker
    Jelle De Loecker about 2 years

    I want to hash a simple array of strings The documentation says you can't simple feed a string into hashlib's update() function, so I tried a regular variable, but then I got the TypeError: object supporting the buffer API required error.

    Here's what I had so far

    def generateHash(data):
        # Prepare the project id hash
        hashId = hashlib.md5()
    
        hashId.update(data)
    
        return hashId.hexdigest()
    
  • mata
    mata almost 11 years
    ...except that update takes bytes, so if you have strings you need to encode them first.
  • Thomas Orozco
    Thomas Orozco almost 11 years
    @mata Oh, didn't realize that was a python3 question - sorry.
  • Eric Seppanen
    Eric Seppanen almost 8 years
    There's no guarantee that an arbitrary object's __repr__ returns something that's a useful input for the hash function. hashlib objects themselves, for example, repr() to '<md5 HASH object @ 0x7fb503555a80>'. Even ignoring the nasty implications if this were used for some cryptographic operation, this isn't even deterministic! The same program run at different times won't return the same hash value.
  • pepr
    pepr almost 8 years
    @EricSeppanen: The answer is related to array of strings. You are right. One should not use hammer for every work.
  • Johann Bauer
    Johann Bauer over 7 years
    You should still try to use something like ",".join(data) as __repr__ isn't guaranteed to be consistent with future versions. Maybe Python 4 will return a slightly different string (e.g. s'abc instead of 'abc').
  • pepr
    pepr about 7 years
    @JohannBauer: It is unlikely. Anyway, the question is rather old, and the situation may have changed. The ','.join(data) is buggy as ['a,', 'bb'] would produce the same result as ['a', ',bb']. But you are right. Any suitable function that captures the representation of the array of strings and returns it as bytes can be used for calculating the hash value.
  • pepr
    pepr about 5 years
    @EricJin: Yes, but the goal was to have the string representation of a list of strings.