How can I fix "TypeError: cannot serialize '_io.BufferedReader' object" error when trying to multiprocess
File handles don't serialize very well... But you could send the name of the zip file instead of the zip filehandle (a string serializes okay between processes). And avoid zip
for your filename as it's a built-in. I've chosen zip_filename
p = Process(target=extract_zip, args=(zip_filename, password))
then:
def extract_zip(zip_filename, password):
try:
zip_file = zipfile.ZipFile(zip_filename)
zip_file.extractall(pwd=password)
The other problem is that your code won't run in parallel because of this:
p.start()
p.join()
p.join
waits for the process to finish... hardly useful. You have to store the process identifiers to join
them in the end.
This may cause other problems: creating too many processes in parallel may be an issue for your machine and won't help much after some point. Consider a multiprocessing.Pool
instead, to limit the number of workers.
Trivial example is:
with multiprocessing.Pool(5) as p:
print(p.map(f, [1, 2, 3, 4, 5, 6, 7]))
Adapted to your example:
with multiprocessing.Pool(5) as p:
p.starmap(extract_zip, [(zip_filename,line.strip()) for line in txt_file])
(starmap
expands the tuples as 2 separate arguments to fit your extract_zip
method, as explained in Python multiprocessing pool.map for multiple arguments)
Arszilla
Updated on February 04, 2020Comments
-
Arszilla about 4 years
I'm trying to switch the threading in my code to multiprocessing to measure its performance and hopefully achieve better brute-forcing potential as my program is meant to brute-force password protected .zip files. But whenever I try to run the program I get this:
BruteZIP2.py -z "Generic ZIP.zip" -f Worm.txt Traceback (most recent call last): File "C:\Users\User\Documents\Jetbrains\PyCharm\BruteZIP\BruteZIP2.py", line 40, in <module> main(args.zip, args.file) File "C:\Users\User\Documents\Jetbrains\PyCharm\BruteZIP\BruteZIP2.py", line 34, in main p.start() File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__ reduction.dump(process_obj, to_child) File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot serialize '_io.BufferedReader' object
I did find threads that had the same issue as I did but they were both unanswered/unsolved. I also tried inserting
Pool
abovep.start()
as I believe this was caused due to the fact that I am on a Windows-based machine but it was no help. My code is as follows:import argparse from multiprocessing import Process import zipfile parser = argparse.ArgumentParser(description="Unzips a password protected .zip by performing a brute-force attack using either a word list, password list or a dictionary.", usage="BruteZIP.py -z zip.zip -f file.txt") # Creates -z arg parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.") # Creates -f arg parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of the word list/password list/dictionary.") args = parser.parse_args() def extract_zip(zip_file, password): try: zip_file.extractall(pwd=password) print(f"[+] Password for the .zip: {password.decode('utf-8')} \n") except: # If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt. print(f"Incorrect password: {password.decode('utf-8')}") # pass def main(zip, file): if (zip == None) | (file == None): # If the args are not used, it displays how to use them to the user. print(parser.usage) exit(0) zip_file = zipfile.ZipFile(zip) # Opens the word list/password list/dictionary in "read binary" mode. txt_file = open(file, "rb") for line in txt_file: password = line.strip() p = Process(target=extract_zip, args=(zip_file, password)) p.start() p.join() if __name__ == '__main__': # BruteZIP.py -z zip.zip -f file.txt. main(args.zip, args.file)
As I said before, I believe this is happening mainly because I am on a Windows-based machine right now. I shared my code with a few others who were on Linux based machines and they had no problem running the code above.
My main goal here is to get 8 processes/pools started to maximize the number of attempts done compared to threading, but due to the fact that I cannot get a fix for
TypeError: cannot serialize '_io.BufferedReader' object
message I am unsure on what to do here and how can I go on to fix it. Any assistance would be appreciated. -
Arszilla about 5 yearsI tried pooling but I dont think I got the right idea. I did add
pool = Pool(8)
abovep.start()
but that might not be the way to do it, right? If so, is there a good guide on it? -
Arszilla about 5 yearsAlso how should I do the
start
andjoin
then? I looked up guides and some documentation from various sources and many of them did it that way. I also did look at Python 3 documentation (this part specifically but unsure how to implement it here as my function (extract_zip) has 2 args in it. -
Arszilla about 5 yearsI assume that
with
part would go to wherefor line in txt_file
is at? Removingfor
and placingwith
? -
Jean-François Fabre about 5 yearsyes, remove the loop, it's done in the argument generation
-
Arszilla about 5 yearsI did take your reply and comments and edited the parts you told me to. As a result I got this. When I do try to run it with a ZIP named "Generic ZIP" and a .txt with 5 numbers like
00001
,00214321
,0987654321
etc I still get the same TypeError. Not sure what is wrong/why is the error ongoing, as I removedp.start()
andp.join()
and replaced the wholefor line in txt_file
withwith multiprocessing.Pool(5) as pool
. -
Jean-François Fabre about 5 yearsnote that I've changed
zip_file
forzip_filename
, which waszip
for you, but I don't want to usezip
as it's a built-in. Pass the string. Read my answer again. -
Arszilla about 5 yearsAs the site is telling me to move the chatter here to a chat (even though I am 1 rep short) I'll ask some brief questions. In the original code,
zip
was used as an arg fordef main()
i.edef main(zip, file)
. Is that what you mean withzip
, as I don't see any otherzip
? Also I am not entirely sure with what you mean with pass the string. Pass which string to where? You talkingg aboutdef extract_zip
? -
Arszilla about 5 yearsWorks! I guess that solves it! One last question before I tick this post off as answered: Are there any 'dangers' to multiprocessing? Like same entry being done twice by any instance or such? Or 'writing to the disk' as some others put it (Not sure what that still means) Also will
Pool
prevent the code working in MacOS or Linux by any chance? -
Jean-François Fabre about 5 yearsno, same entry would not be done twice unless there's a bug in the input. And danger? well, the only danger is: don't rely on multiprocessing until you've optimized your code very well. And python isn't very good at intensive computations, maybe a compiled language could do better (run from python with multithreading)
-
Arszilla about 5 yearsLet us continue this discussion in chat.