Convert unique numbers to md5 hash using pandas
Solution 1
hashlib.md5
takes a single string as input -- you can't pass it an array of values as you can with some NumPy/Pandas functions. So instead, you could use a list comprehension to build a list of md5sums:
ob['md5'] = [hashlib.md5(val).hexdigest() for val in ob['ssno']]
Solution 2
In case you are hashing to SHA256, you'll need to encode your string first to (probably) UTF-8:
ob['sha256'] = [hashlib.sha256(val.encode('UTF-8')).hexdigest() for val in ob['ssno']]
Dave
I am a Program Analyst. I have been using python and pandas for data analysis since 2011. Recently I started developing dashboards with dash and plotly. One day, I'd like to contribute to the cyber world by providing simple dynamic visuals for just about everything. I am extremely grateful for the stackoverflow community.
Updated on July 30, 2022Comments
-
Dave almost 2 years
Good morning, All.
I want to convert my social security numbers to a md5 hash hex number. The outcome should be a unique md5 hash hex number for each social security number.
My data format is as follows:
ob = onboard[['regions','lname','ssno']][:10] ob regions lname ssno 0 Northern Region (R1) Banderas 123456789 1 Northern Region (R1) Garfield 234567891 2 Northern Region (R1) Pacino 345678912 3 Northern Region (R1) Baldwin 456789123 4 Northern Region (R1) Brody 567891234 5 Northern Region (R1) Johnson 6789123456 6 Northern Region (R1) Guinness 7890123456 7 Northern Region (R1) Hopkins 891234567 8 Northern Region (R1) Paul 891234567 9 Northern Region (R1) Arkin 987654321
I've tried the following code using
hashlib
:import hashlib ob['md5'] = hashlib.md5(['ssno'])
This gave me the error that it had to be a string not a list. So I tried the following:
ob['md5'] = hashlib.md5('ssno').hexdigest() regions lname ssno md5 0 Northern Region (R1) Banderas 123456789 a1b3ec3d8a026d392ad551701ad7881e 1 Northern Region (R1) Garfield 234567891 a1b3ec3d8a026d392ad551701ad7881e 2 Northern Region (R1) Pacino 345678912 a1b3ec3d8a026d392ad551701ad7881e 3 Northern Region (R1) Baldwin 456789123 a1b3ec3d8a026d392ad551701ad7881e 4 Northern Region (R1) Brody 567891234 a1b3ec3d8a026d392ad551701ad7881e 5 Northern Region (R1) Johnson 678912345 a1b3ec3d8a026d392ad551701ad7881e 6 Northern Region (R1) Johnson 789123456 a1b3ec3d8a026d392ad551701ad7881e 7 Northern Region (R1) Guiness 891234567 a1b3ec3d8a026d392ad551701ad7881e 8 Northern Region (R1) Hopkins 912345678 a1b3ec3d8a026d392ad551701ad7881e 9 Northern Region (R1) Paul 159753456 a1b3ec3d8a026d392ad551701ad7881e
This was very close to what I need but all the hex numbers came out the same regardless if the social security number was different or not. I am trying to get a hex number with unique hex numbers for each social security number.
Any suggestions?
-
Dave about 9 yearsAbsolutely, Beautiful! Makes sense. Thanks for educating me and assisting with a solution! Exactly what I needed!
-
rocksteady over 5 yearsFor anyone hitting 'object supporting the buffer API required' error on this, it can be caused null (NaN) values in your Pandas series that may need to be processed or removed before hashing.