How to correct TypeError: Unicode-objects must be encoded before hashing?
Solution 1
It is probably looking for a character encoding from wordlistfile
.
wordlistfile = open(wordlist,"r",encoding='utf-8')
Or, if you're working on a line-by-line basis:
line.encode('utf-8')
EDIT
Per the comment below and this answer.
My answer above assumes that the desired output is a str
from the wordlist
file. If you are comfortable in working in bytes
, then you're better off using open(wordlist, "rb")
. But it is important to remember that your hashfile
should NOT use rb
if you are comparing it to the output of hexdigest
. hashlib.md5(value).hashdigest()
outputs a str
and that cannot be directly compared with a bytes object: 'abc' != b'abc'
. (There's a lot more to this topic, but I don't have the time ATM).
It should also be noted that this line:
line.replace("\n", "")
Should probably be
line.strip()
That will work for both bytes and str's. But if you decide to simply convert to bytes
, then you can change the line to:
line.replace(b"\n", b"")
Solution 2
You must have to define encoding format
like utf-8
,
Try this easy way,
This example generates a random number using the SHA256 algorithm:
>>> import hashlib
>>> hashlib.sha256(str(random.getrandbits(256)).encode('utf-8')).hexdigest()
'cd183a211ed2434eac4f31b317c573c50e6c24e3a28b82ddcb0bf8bedf387a9f'
Solution 3
import hashlib
string_to_hash = '123'
hash_object = hashlib.sha256(str(string_to_hash).encode('utf-8'))
print('Hash', hash_object.hexdigest())
Solution 4
The error already says what you have to do. MD5 operates on bytes, so you have to encode Unicode string into bytes
, e.g. with line.encode('utf-8')
.
Solution 5
To store the password (PY3):
import hashlib, os
password_salt = os.urandom(32).hex()
password = '12345'
hash = hashlib.sha512()
hash.update(('%s%s' % (password_salt, password)).encode('utf-8'))
password_hash = hash.hexdigest()
JohnnyFromBF
Updated on July 08, 2022Comments
-
JohnnyFromBF almost 2 years
I have this error:
Traceback (most recent call last): File "python_md5_cracker.py", line 27, in <module> m.update(line) TypeError: Unicode-objects must be encoded before hashing
when I try to execute this code in Python 3.2.2:
import hashlib, sys m = hashlib.md5() hash = "" hash_file = input("What is the file name in which the hash resides? ") wordlist = input("What is your wordlist? (Enter the file name) ") try: hashdocument = open(hash_file, "r") except IOError: print("Invalid file.") raw_input() sys.exit() else: hash = hashdocument.readline() hash = hash.replace("\n", "") try: wordlistfile = open(wordlist, "r") except IOError: print("Invalid file.") raw_input() sys.exit() else: pass for line in wordlistfile: # Flush the buffer (this caused a massive problem when placed # at the beginning of the script, because the buffer kept getting # overwritten, thus comparing incorrect hashes) m = hashlib.md5() line = line.replace("\n", "") m.update(line) word_hash = m.hexdigest() if word_hash == hash: print("Collision! The word corresponding to the given hash is", line) input() sys.exit() print("The hash given does not correspond to any supplied word in the wordlist.") input() sys.exit()
-
Tanky Woo over 8 years
open(wordlist,"r",encoding='utf-8')
why use open with specific encoding, the encoding is specified the decode codec, without this option, it use platform-dependent encoding. -
Yash over 5 yearsThis line makes the password impossible to use. password_salt = os.urandom(32).hex() It should a fixed known value but it can be secret for server only. Please correct me or adapt it to your code.
-
Carson over 5 yearsI agree with @Yash You either have a single salt you use for every hash (not the best), or if you generate a random salt for each hash, you must store it with the hash to use again later for comparison
-
kundan over 3 yearshashlib.sha256 method always expected unicode. In Python-2 str was both str and unicode, thus just passing string_to_hash used to work just fine. However, in Python-3 string(text, here string_to_hash) and unicode are two different types. So when we pass just string_to_hash(which is of type text), it throws error stating a unicode value is required.
-
ShadowRanger over 3 yearsThe first half of this is flat wrong, and it's shocking it got up-voted as high as it did. Specifying an
encoding
explicitly just changes how it decodes the bytes on disk to get astr
(a text type storing arbitrary Unicode), but it would decode tostr
without that, and the problem is usingstr
in the first place. Theline.encode('utf-8')
undoes that mistaken decoding, but the OP should just be opening the file in'rb'
mode in the first place (with no encoding) soline
is abytes
object in the first place (a few trivial changes needed to match, e.g. in.replace("\n", '')
). -
ShadowRanger over 3 yearsI am absolutely amazed I had to scroll down this far to find the first sane answer. Unless there is some reason to think the
wordlist
file is in the wrong encoding (and must therefore be decoded from the wrong encoding, then encoded with the correct encoding for hashing) this is by far the best solution, avoiding pointless decoding and reencoding in favor of just processingbytes
(the source of the error in the OP's code). -
ShadowRanger over 3 yearsWhy decode only to reencode when you could just process the file in binary mode and deal with
bytes
the whole way? -
Ishbir over 3 years@ShadowRanger for this simple case (just reading lines and stripping the b'\n' at the end of each line) your suggestion is correct and adequate.
-
cwallenpoole over 3 years@ShadowRanger And if the OP wants a
str
? I added a bit to the answer, but my original reply was the short, sweet, and immediately available. It also happened to be the right answer for a project I was working on when I wrote the above reply, so¯\_(ツ)_/¯
-
Nishad C M almost 2 yearspython3.*, only working option i could figure out is opening the file in binary mode. and then doing th hash operation. python2 just normal open and hashing text works.