Can I use multiprocessing.Pool in a method of a class?

11,520

The issue is that you've got an unpicklable instance variable (namelist) in the Book instance. Because you're calling pool.map on an instance method, and you're running on Windows, the entire instance needs to be picklable in order for it to be passed to the child process. Book.namelist is a open file object (_io.BufferedReader), which can't be pickled. You can fix this a couple of ways. Based on the example code, it looks like you could just make format_char a top-level function:

def format_char(char):
    char = char + "a"
    return char


class Book(object):
    def __init__(self, arg):
        self.namelist = arg

    def format_book(self):
        self.tempread = ""
        charlist = [f.read() for f in self.namelist] #list of char
        with Pool() as pool:
            txtlist = pool.map(format_char,charlist)
        self.tempread = "".join(txtlist)
        return self.tempread

However, if in reality, you need format_char to be an instance method, you can use __getstate__/__setstate__ to make Book picklable, by removing the namelist argument from the instance before pickling it:

class Book(object):
    def __init__(self, arg):
        self.namelist = arg

    def __getstate__(self):
        """ This is called before pickling. """
        state = self.__dict__.copy()
        del state['namelist']
        return state

    def __setstate__(self, state):
        """ This is called while unpickling. """
        self.__dict__.update(state)

    def format_char(self,char):
        char = char + "a"

    def format_book(self):
        self.tempread = ""
        charlist = [f.read() for f in self.namelist] #list of char
        with Pool() as pool:
            txtlist = pool.map(self.format_char,charlist)
        self.tempread = "".join(txtlist)
        return self.tempread

This would be ok as long as you don't need to access namelist in the child process.

Share:
11,520
PaleNeutron
Author by

PaleNeutron

Updated on June 07, 2022

Comments

  • PaleNeutron
    PaleNeutron almost 2 years

    I am tring to use multiprocessing in my code for better performance.

    However, I got an error as follows:

    Traceback (most recent call last):
      File "D:\EpubBuilder\TinyEpub.py", line 49, in <module>
        e.epub2txt()
      File "D:\EpubBuilder\TinyEpub.py", line 43, in epub2txt
        tempread = self.get_text()
      File "D:\EpubBuilder\TinyEpub.py", line 29, in get_text
        txtlist = pool.map(self.char2text,charlist)
      File "C:\Python34\lib\multiprocessing\pool.py", line 260, in map
        return self._map_async(func, iterable, mapstar, chunksize).get()
      File "C:\Python34\lib\multiprocessing\pool.py", line 599, in get
        raise self._value
      File "C:\Python34\lib\multiprocessing\pool.py", line 383, in _handle_tasks
        put(task)
      File "C:\Python34\lib\multiprocessing\connection.py", line 206, in send
        self._send_bytes(ForkingPickler.dumps(obj))
      File "C:\Python34\lib\multiprocessing\reduction.py", line 50, in dumps
        cls(buf, protocol).dump(obj)
    TypeError: cannot serialize '_io.BufferedReader' object
    

    I have tried it an other way and got this error:

    TypeError: cannot serialize '_io.TextIOWrapper' object
    

    My code looks like this:

    from multiprocessing import Pool
    class Book(object):
        def __init__(self, arg):
            self.namelist = arg
        def format_char(self,char):
            char = char + "a"
            return char
        def format_book(self):
            self.tempread = ""
            charlist = [f.read() for f in self.namelist] #list of char
            with Pool() as pool:
                txtlist = pool.map(self.format_char,charlist)
            self.tempread = "".join(txtlist)
            return self.tempread
    
    if __name__ == '__main__':
        import os
        b = Book([open(f) for f in os.listdir()])
        t = b.format_book()
        print(t)
    

    I think that the error is raised because of not using the Pool in the main function.

    Is my conjecture right? And how can I modify my code to fix the error?

  • PaleNeutron
    PaleNeutron over 9 years
    Thanks! It works well now, and my conjecture is wrong.
  • Daniel
    Daniel almost 3 years
    This is the real answer: self must be picklable. For goodness sake, I searched way too long to get the unmistakably right answer.