Can I use multiprocessing.Pool in a method of a class?
The issue is that you've got an unpicklable instance variable (namelist
) in the Book
instance. Because you're calling pool.map
on an instance method, and you're running on Windows, the entire instance needs to be picklable in order for it to be passed to the child process. Book.namelist
is a open file object (_io.BufferedReader
), which can't be pickled. You can fix this a couple of ways. Based on the example code, it looks like you could just make format_char
a top-level function:
def format_char(char):
char = char + "a"
return char
class Book(object):
def __init__(self, arg):
self.namelist = arg
def format_book(self):
self.tempread = ""
charlist = [f.read() for f in self.namelist] #list of char
with Pool() as pool:
txtlist = pool.map(format_char,charlist)
self.tempread = "".join(txtlist)
return self.tempread
However, if in reality, you need format_char
to be an instance method, you can use __getstate__
/__setstate__
to make Book
picklable, by removing the namelist
argument from the instance before pickling it:
class Book(object):
def __init__(self, arg):
self.namelist = arg
def __getstate__(self):
""" This is called before pickling. """
state = self.__dict__.copy()
del state['namelist']
return state
def __setstate__(self, state):
""" This is called while unpickling. """
self.__dict__.update(state)
def format_char(self,char):
char = char + "a"
def format_book(self):
self.tempread = ""
charlist = [f.read() for f in self.namelist] #list of char
with Pool() as pool:
txtlist = pool.map(self.format_char,charlist)
self.tempread = "".join(txtlist)
return self.tempread
This would be ok as long as you don't need to access namelist
in the child process.
PaleNeutron
Updated on June 07, 2022Comments
-
PaleNeutron almost 2 years
I am tring to use
multiprocessing
in my code for better performance.However, I got an error as follows:
Traceback (most recent call last): File "D:\EpubBuilder\TinyEpub.py", line 49, in <module> e.epub2txt() File "D:\EpubBuilder\TinyEpub.py", line 43, in epub2txt tempread = self.get_text() File "D:\EpubBuilder\TinyEpub.py", line 29, in get_text txtlist = pool.map(self.char2text,charlist) File "C:\Python34\lib\multiprocessing\pool.py", line 260, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Python34\lib\multiprocessing\pool.py", line 599, in get raise self._value File "C:\Python34\lib\multiprocessing\pool.py", line 383, in _handle_tasks put(task) File "C:\Python34\lib\multiprocessing\connection.py", line 206, in send self._send_bytes(ForkingPickler.dumps(obj)) File "C:\Python34\lib\multiprocessing\reduction.py", line 50, in dumps cls(buf, protocol).dump(obj) TypeError: cannot serialize '_io.BufferedReader' object
I have tried it an other way and got this error:
TypeError: cannot serialize '_io.TextIOWrapper' object
My code looks like this:
from multiprocessing import Pool class Book(object): def __init__(self, arg): self.namelist = arg def format_char(self,char): char = char + "a" return char def format_book(self): self.tempread = "" charlist = [f.read() for f in self.namelist] #list of char with Pool() as pool: txtlist = pool.map(self.format_char,charlist) self.tempread = "".join(txtlist) return self.tempread if __name__ == '__main__': import os b = Book([open(f) for f in os.listdir()]) t = b.format_book() print(t)
I think that the error is raised because of not using the
Pool
in the main function.Is my conjecture right? And how can I modify my code to fix the error?
-
PaleNeutron over 9 yearsThanks! It works well now, and my conjecture is wrong.
-
Daniel almost 3 yearsThis is the real answer: self must be picklable. For goodness sake, I searched way too long to get the unmistakably right answer.