How to correct TypeError: Unicode-objects must be encoded before hashing?

python python-3.x unicode syntax-error hashlib

479,226

Solution 1

It is probably looking for a character encoding from wordlistfile.

wordlistfile = open(wordlist,"r",encoding='utf-8')

Or, if you're working on a line-by-line basis:

line.encode('utf-8')

EDIT

Per the comment below and this answer.

My answer above assumes that the desired output is a str from the wordlist file. If you are comfortable in working in bytes, then you're better off using open(wordlist, "rb"). But it is important to remember that your hashfile should NOT use rb if you are comparing it to the output of hexdigest. hashlib.md5(value).hashdigest() outputs a str and that cannot be directly compared with a bytes object: 'abc' != b'abc'. (There's a lot more to this topic, but I don't have the time ATM).

It should also be noted that this line:

line.replace("\n", "")

Should probably be

line.strip()

That will work for both bytes and str's. But if you decide to simply convert to bytes, then you can change the line to:

line.replace(b"\n", b"")

Solution 2

You must have to define encoding format like utf-8, Try this easy way,

This example generates a random number using the SHA256 algorithm:

>>> import hashlib
>>> hashlib.sha256(str(random.getrandbits(256)).encode('utf-8')).hexdigest()
'cd183a211ed2434eac4f31b317c573c50e6c24e3a28b82ddcb0bf8bedf387a9f'

Solution 3

import hashlib
string_to_hash = '123'
hash_object = hashlib.sha256(str(string_to_hash).encode('utf-8'))
print('Hash', hash_object.hexdigest())

Solution 4

The error already says what you have to do. MD5 operates on bytes, so you have to encode Unicode string into bytes, e.g. with line.encode('utf-8').

Solution 5

To store the password (PY3):

import hashlib, os
password_salt = os.urandom(32).hex()
password = '12345'

hash = hashlib.sha512()
hash.update(('%s%s' % (password_salt, password)).encode('utf-8'))
password_hash = hash.hexdigest()

View more solutions

479,226

Author by

JohnnyFromBF

Updated on July 08, 2022

Comments

JohnnyFromBF almost 2 years

I have this error:

Traceback (most recent call last):
  File "python_md5_cracker.py", line 27, in <module>
  m.update(line)
TypeError: Unicode-objects must be encoded before hashing

when I try to execute this code in Python 3.2.2:

import hashlib, sys
m = hashlib.md5()
hash = ""
hash_file = input("What is the file name in which the hash resides?  ")
wordlist = input("What is your wordlist?  (Enter the file name)  ")
try:
  hashdocument = open(hash_file, "r")
except IOError:
  print("Invalid file.")
  raw_input()
  sys.exit()
else:
  hash = hashdocument.readline()
  hash = hash.replace("\n", "")

try:
  wordlistfile = open(wordlist, "r")
except IOError:
  print("Invalid file.")
  raw_input()
  sys.exit()
else:
  pass
for line in wordlistfile:
  # Flush the buffer (this caused a massive problem when placed 
  # at the beginning of the script, because the buffer kept getting
  # overwritten, thus comparing incorrect hashes)
  m = hashlib.md5()
  line = line.replace("\n", "")
  m.update(line)
  word_hash = m.hexdigest()
  if word_hash == hash:
    print("Collision! The word corresponding to the given hash is", line)
    input()
    sys.exit()

print("The hash given does not correspond to any supplied word in the wordlist.")
input()
sys.exit()

Tanky Woo over 8 years

open(wordlist,"r",encoding='utf-8') why use open with specific encoding, the encoding is specified the decode codec, without this option, it use platform-dependent encoding.
Yash over 5 years

This line makes the password impossible to use. password_salt = os.urandom(32).hex() It should a fixed known value but it can be secret for server only. Please correct me or adapt it to your code.
Carson over 5 years

I agree with @Yash You either have a single salt you use for every hash (not the best), or if you generate a random salt for each hash, you must store it with the hash to use again later for comparison
kundan over 3 years

hashlib.sha256 method always expected unicode. In Python-2 str was both str and unicode, thus just passing string_to_hash used to work just fine. However, in Python-3 string(text, here string_to_hash) and unicode are two different types. So when we pass just string_to_hash(which is of type text), it throws error stating a unicode value is required.
ShadowRanger over 3 years

The first half of this is flat wrong, and it's shocking it got up-voted as high as it did. Specifying an encoding explicitly just changes how it decodes the bytes on disk to get a str (a text type storing arbitrary Unicode), but it would decode to str without that, and the problem is using str in the first place. The line.encode('utf-8') undoes that mistaken decoding, but the OP should just be opening the file in 'rb' mode in the first place (with no encoding) so line is a bytes object in the first place (a few trivial changes needed to match, e.g. in .replace("\n", '')).
ShadowRanger over 3 years

I am absolutely amazed I had to scroll down this far to find the first sane answer. Unless there is some reason to think the wordlist file is in the wrong encoding (and must therefore be decoded from the wrong encoding, then encoded with the correct encoding for hashing) this is by far the best solution, avoiding pointless decoding and reencoding in favor of just processing bytes (the source of the error in the OP's code).
ShadowRanger over 3 years

Why decode only to reencode when you could just process the file in binary mode and deal with bytes the whole way?
Ishbir over 3 years

@ShadowRanger for this simple case (just reading lines and stripping the b'\n' at the end of each line) your suggestion is correct and adequate.
cwallenpoole over 3 years

@ShadowRanger And if the OP wants a str? I added a bit to the answer, but my original reply was the short, sweet, and immediately available. It also happened to be the right answer for a project I was working on when I wrote the above reply, so ¯\_(ツ)_/¯
Nishad C M almost 2 years

python3.*, only working option i could figure out is opening the file in binary mode. and then doing th hash operation. python2 just normal open and hashing text works.