How can I fix "TypeError: cannot serialize '_io.BufferedReader' object" error when trying to multiprocess

python python-3.x windows multiprocessing pool

20,039

File handles don't serialize very well... But you could send the name of the zip file instead of the zip filehandle (a string serializes okay between processes). And avoid zip for your filename as it's a built-in. I've chosen zip_filename

p = Process(target=extract_zip, args=(zip_filename, password))

then:

def extract_zip(zip_filename, password):
      try:
          zip_file = zipfile.ZipFile(zip_filename)
          zip_file.extractall(pwd=password)

The other problem is that your code won't run in parallel because of this:

      p.start()
      p.join()

p.join waits for the process to finish... hardly useful. You have to store the process identifiers to join them in the end.

This may cause other problems: creating too many processes in parallel may be an issue for your machine and won't help much after some point. Consider a multiprocessing.Pool instead, to limit the number of workers.

Trivial example is:

with multiprocessing.Pool(5) as p:
    print(p.map(f, [1, 2, 3, 4, 5, 6, 7]))

Adapted to your example:

with multiprocessing.Pool(5) as p:
    p.starmap(extract_zip, [(zip_filename,line.strip()) for line in txt_file])

(starmap expands the tuples as 2 separate arguments to fit your extract_zip method, as explained in Python multiprocessing pool.map for multiple arguments)

20,039

Author by

Arszilla

Updated on February 04, 2020

Comments

Arszilla about 4 years

I'm trying to switch the threading in my code to multiprocessing to measure its performance and hopefully achieve better brute-forcing potential as my program is meant to brute-force password protected .zip files. But whenever I try to run the program I get this:

BruteZIP2.py -z "Generic ZIP.zip" -f  Worm.txt
Traceback (most recent call last):
  File "C:\Users\User\Documents\Jetbrains\PyCharm\BruteZIP\BruteZIP2.py", line 40, in <module>
    main(args.zip, args.file)
  File "C:\Users\User\Documents\Jetbrains\PyCharm\BruteZIP\BruteZIP2.py", line 34, in main
    p.start()
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object

I did find threads that had the same issue as I did but they were both unanswered/unsolved. I also tried inserting Pool above p.start() as I believe this was caused due to the fact that I am on a Windows-based machine but it was no help. My code is as follows:

  import argparse
  from multiprocessing import Process
  import zipfile

  parser = argparse.ArgumentParser(description="Unzips a password protected .zip by performing a brute-force attack using either a word list, password list or a dictionary.", usage="BruteZIP.py -z zip.zip -f file.txt")
  # Creates -z arg
  parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
  # Creates -f arg
  parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of the word list/password list/dictionary.")
  args = parser.parse_args()


  def extract_zip(zip_file, password):
      try:
          zip_file.extractall(pwd=password)
          print(f"[+] Password for the .zip: {password.decode('utf-8')} \n")
      except:
          # If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
          print(f"Incorrect password: {password.decode('utf-8')}")
          # pass


  def main(zip, file):
      if (zip == None) | (file == None):
          # If the args are not used, it displays how to use them to the user.
          print(parser.usage)
          exit(0)
      zip_file = zipfile.ZipFile(zip)
      # Opens the word list/password list/dictionary in "read binary" mode.
      txt_file = open(file, "rb")
      for line in txt_file:
          password = line.strip()
          p = Process(target=extract_zip, args=(zip_file, password))
          p.start()
          p.join()


  if __name__ == '__main__':
      # BruteZIP.py -z zip.zip -f file.txt.
      main(args.zip, args.file)

As I said before, I believe this is happening mainly because I am on a Windows-based machine right now. I shared my code with a few others who were on Linux based machines and they had no problem running the code above.

My main goal here is to get 8 processes/pools started to maximize the number of attempts done compared to threading, but due to the fact that I cannot get a fix for TypeError: cannot serialize '_io.BufferedReader' object message I am unsure on what to do here and how can I go on to fix it. Any assistance would be appreciated.

Arszilla about 5 years

I tried pooling but I dont think I got the right idea. I did add pool = Pool(8) above p.start() but that might not be the way to do it, right? If so, is there a good guide on it?
Arszilla about 5 years

Also how should I do the start and join then? I looked up guides and some documentation from various sources and many of them did it that way. I also did look at Python 3 documentation (this part specifically but unsure how to implement it here as my function (extract_zip) has 2 args in it.
Arszilla about 5 years

I assume that with part would go to where for line in txt_file is at? Removing for and placing with?
Jean-François Fabre about 5 years

yes, remove the loop, it's done in the argument generation
Arszilla about 5 years

I did take your reply and comments and edited the parts you told me to. As a result I got this. When I do try to run it with a ZIP named "Generic ZIP" and a .txt with 5 numbers like 00001, 00214321, 0987654321 etc I still get the same TypeError. Not sure what is wrong/why is the error ongoing, as I removed p.start() and p.join() and replaced the whole for line in txt_file with with multiprocessing.Pool(5) as pool.
Jean-François Fabre about 5 years

note that I've changed zip_file for zip_filename, which was zip for you, but I don't want to use zip as it's a built-in. Pass the string. Read my answer again.
Arszilla about 5 years

As the site is telling me to move the chatter here to a chat (even though I am 1 rep short) I'll ask some brief questions. In the original code, zip was used as an arg for def main() i.e def main(zip, file). Is that what you mean with zip, as I don't see any other zip? Also I am not entirely sure with what you mean with pass the string. Pass which string to where? You talkingg about def extract_zip?
Arszilla about 5 years

Works! I guess that solves it! One last question before I tick this post off as answered: Are there any 'dangers' to multiprocessing? Like same entry being done twice by any instance or such? Or 'writing to the disk' as some others put it (Not sure what that still means) Also will Pool prevent the code working in MacOS or Linux by any chance?
Jean-François Fabre about 5 years

no, same entry would not be done twice unless there's a bug in the input. And danger? well, the only danger is: don't rely on multiprocessing until you've optimized your code very well. And python isn't very good at intensive computations, maybe a compiled language could do better (run from python with multithreading)
Arszilla about 5 years

Let us continue this discussion in chat.