Python multiprocessing pool stuck

15,533

As you may read from the answer pointed out by John in the comments, multiprocessing.Pool, in general, should not be expected to work well within an interactive interpreter. To understand why it is the case, consider how Pool does its job:

  • It forks python workers, passing to them the name of the current Python file.
  • The workers then essentially do import <this file>, and listen for messages from the master.
  • The master sends function names along with function arguments to the workers via pickling. Note that functions themselves cannot be sent, because the pickle protocol does not allow that.

When you try to perform this procedure from an interactive prompt, there is no reasonable "current Python file" to pass to the children for importing. Moreover, the functions you defined in your interactive prompt are not part of any module (they are dynamically defined), and hence cannot be imported by the children from that nonexistent module. So your easiest bet is to simply avoid using multiprocessing within IPython. IPython parallel is so much better anyway :)


For completeness' sake I also checked what exactly happens in my particular case of an IPython 4 running under Python 2.7 on Windows 8 (where I can observe the interpreter getting stuck as well). Interestingly, the reason IPython gets stuck in the first place is not one of those mentioned above.

It turns out that multiprocessing checks whether __main__.__file__ is defined, and if not, sends sys.argv[0] as the "current filename" to the children. In the case of (my version of) IPython sys.argv[0] is equal to C:\Dev\Anaconda\lib\site-packages\ipykernel\__main__.py.

Unfortunately, the worker processes before starting up happen to check whether the file they are going to import is already in their sys.modules. Line 488 of multiprocessing/forking.py says:

assert main_name not in sys.modules, main_name

When the main_name is __main__ (as is the case with ipython's workers) this assertion fails and the workers fail to start. The same code, however, is "smart" enough to check whether the passed name is ipython, in which case it does no such checks nor does not import anything.

Consequently, the problem of workers failing to start could be solved using an ugly hack of defining __main__.__file__ to be equal to ipython. The following code does work fine from an IPython cell:

import sys
sys.modules['__main__'].__file__ = 'ipython'
from multiprocessing import Pool

pool = Pool(processes=4)
inputs = [0, 1, 2, 3, 4]
outputs = pool.map(abs, inputs)

Note that this example asks the workers to compute abs, a built-in function. It would fail (gracefully, with an exception) if you asked the workers to compute a function you defined within the notebook.

It turns out you can, in principle, go further with the hacking and have your functions sent over to the workers using some manual pickling of their code. You can find a pretty cool example of such a hack here.

Share:
15,533
Duccio Bertieri
Author by

Duccio Bertieri

Updated on June 04, 2022

Comments

  • Duccio Bertieri
    Duccio Bertieri almost 2 years

    I'm trying to run some sample code of the multiprocessing.pool module of python, found in the web. The code is:

    def square(x):
        return x * x
    if __name__ == '__main__':
        pool = Pool(processes=4)
        inputs = [0, 1, 2, 3, 4]
        outputs = pool.map(square, inputs)
    

    But when i try to run it, it never finsh the execution and i have to restart the kernel of my IpythonNotebook notebook. What's the problem?