Python Process Pool non-daemonic?
Solution 1
The multiprocessing.pool.Pool
class creates the worker processes in its __init__
method, makes them daemonic and starts them, and it is not possible to re-set their daemon
attribute to False
before they are started (and afterwards it's not allowed anymore). But you can create your own sub-class of multiprocesing.pool.Pool
(multiprocessing.Pool
is just a wrapper function) and substitute your own multiprocessing.Process
sub-class, which is always non-daemonic, to be used for the worker processes.
Here's a full example of how to do this. The important parts are the two classes NoDaemonProcess
and MyPool
at the top and to call pool.close()
and pool.join()
on your MyPool
instance at the end.
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import multiprocessing
# We must import this explicitly, it is not imported by the top-level
# multiprocessing module.
import multiprocessing.pool
import time
from random import randint
class NoDaemonProcess(multiprocessing.Process):
# make 'daemon' attribute always return False
def _get_daemon(self):
return False
def _set_daemon(self, value):
pass
daemon = property(_get_daemon, _set_daemon)
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class MyPool(multiprocessing.pool.Pool):
Process = NoDaemonProcess
def sleepawhile(t):
print("Sleeping %i seconds..." % t)
time.sleep(t)
return t
def work(num_procs):
print("Creating %i (daemon) workers and jobs in child." % num_procs)
pool = multiprocessing.Pool(num_procs)
result = pool.map(sleepawhile,
[randint(1, 5) for x in range(num_procs)])
# The following is not really needed, since the (daemon) workers of the
# child's pool are killed when the child is terminated, but it's good
# practice to cleanup after ourselves anyway.
pool.close()
pool.join()
return result
def test():
print("Creating 5 (non-daemon) workers and jobs in main process.")
pool = MyPool(5)
result = pool.map(work, [randint(1, 5) for x in range(5)])
pool.close()
pool.join()
print(result)
if __name__ == '__main__':
test()
Solution 2
I had the necessity to employ a non-daemonic pool in Python 3.7 and ended up adapting the code posted in the accepted answer. Below there's the snippet that creates the non-daemonic pool:
import multiprocessing.pool
class NoDaemonProcess(multiprocessing.Process):
@property
def daemon(self):
return False
@daemon.setter
def daemon(self, value):
pass
class NoDaemonContext(type(multiprocessing.get_context())):
Process = NoDaemonProcess
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class NestablePool(multiprocessing.pool.Pool):
def __init__(self, *args, **kwargs):
kwargs['context'] = NoDaemonContext()
super(NestablePool, self).__init__(*args, **kwargs)
As the current implementation of multiprocessing
has been extensively refactored to be based on contexts, we need to provide a NoDaemonContext
class that has our NoDaemonProcess
as attribute. NestablePool
will then use that context instead of the default one.
That said, I should warn that there are at least two caveats to this approach:
- It still depends on implementation details of the
multiprocessing
package, and could therefore break at any time. - There are valid reasons why
multiprocessing
made it so hard to use non-daemonic processes, many of which are explained here. The most compelling in my opinion is:
As for allowing children threads to spawn off children of its own using subprocess runs the risk of creating a little army of zombie 'grandchildren' if either the parent or child threads terminate before the subprocess completes and returns.
Solution 3
The multiprocessing module has a nice interface to use pools with processes or threads. Depending on your current use case, you might consider using multiprocessing.pool.ThreadPool
for your outer Pool, which will result in threads (that allow to spawn processes from within) as opposed to processes.
It might be limited by the GIL, but in my particular case (I tested both), the startup time for the processes from the outer Pool
as created here far outweighed the solution with ThreadPool
.
It's really easy to swap Processes
for Threads
. Read more about how to use a ThreadPool
solution here or here.
Solution 4
As of Python 3.8, concurrent.futures.ProcessPoolExecutor
doesn't have this limitation. It can have a nested process pool with no problem at all:
from concurrent.futures import ProcessPoolExecutor as Pool
from itertools import repeat
from multiprocessing import current_process
import time
def pid():
return current_process().pid
def _square(i): # Runs in inner_pool
square = i ** 2
time.sleep(i / 10)
print(f'{pid()=} {i=} {square=}')
return square
def _sum_squares(i, j): # Runs in outer_pool
with Pool(max_workers=2) as inner_pool:
squares = inner_pool.map(_square, (i, j))
sum_squares = sum(squares)
time.sleep(sum_squares ** .5)
print(f'{pid()=}, {i=}, {j=} {sum_squares=}')
return sum_squares
def main():
with Pool(max_workers=3) as outer_pool:
for sum_squares in outer_pool.map(_sum_squares, range(5), repeat(3)):
print(f'{pid()=} {sum_squares=}')
if __name__ == "__main__":
main()
The above demonstration code was tested with Python 3.8.
A limitation of ProcessPoolExecutor
, however, is that it doesn't have maxtasksperchild
. If you need this, consider the answer by Massimiliano instead.
Credit: answer by jfs
Solution 5
On some Python versions replacing standard Pool to custom can raise error: AssertionError: group argument must be None for now
.
Here I found a solution that can help:
class NoDaemonProcess(multiprocessing.Process):
# make 'daemon' attribute always return False
@property
def daemon(self):
return False
@daemon.setter
def daemon(self, val):
pass
class NoDaemonProcessPool(multiprocessing.pool.Pool):
def Process(self, *args, **kwds):
proc = super(NoDaemonProcessPool, self).Process(*args, **kwds)
proc.__class__ = NoDaemonProcess
return proc
Max
Tips accepted: Bitcoin: 19kn5GCUGPFQQKbUXJ9ahoNxbHgsE4Yuyt Ether: 0x17cd4a3f9ecb3073a4bb16e1a8aa3b302f72af56 I am a Software Engineering graduate of the Jerusalem College of Technology. I worked as a Software Developer at Abe's Market, doing everything from front end development back to server software (in the Magento framework) and server administration. DevOps at its finest. Now I'm at Glide, where I'm developing the back end of our video chat service and in-house tools for development, testing and analytics. SOreadytohelp
Updated on September 04, 2021Comments
-
Max almost 3 years
Would it be possible to create a python Pool that is non-daemonic? I want a pool to be able to call a function that has another pool inside.
I want this because deamon processes cannot create process. Specifically, it will cause the error:
AssertionError: daemonic processes are not allowed to have children
For example, consider the scenario where
function_a
has a pool which runsfunction_b
which has a pool which runsfunction_c
. This function chain will fail, becausefunction_b
is being run in a daemon process, and daemon processes cannot create processes. -
Admin almost 12 yearsThe above code seems to be hanging for me. Specifically it appears to hang at pool.close() inside work(). Is there anything i am missing ?
-
Chris Arndt almost 12 yearsI just tested my code again with Python 2.7/3.2 (after fixing the "print" lines) on Linux and Python 2.6/2.7/3.2 OS X. Linux and Python 2.7/3.2 on OS X works fine but the code does indeed hang with Python 2.6 on OS X (Lion). This seems to be a bug in the multiprocessing module, which got fixed, but I haven't actually checked the bug tracker.
-
Mike Vella over 10 yearsThis should really be fixed in the multiprocessing module (an option for non-daemonic workers should be available). Does anyone know who maintains it?
-
frmdstryr about 10 yearsThanks! On windows you also need to call
multiprocessing.freeze_support()
-
Chris Lucian over 9 yearsNice work. If anyone is getting memory leak with this try using "with closing(MyPool(processes=num_cpu)) as pool:" to dispose of the pool properly
-
max about 9 yearsWhat's the disadvantages of using
MyPool
instead of the defaultPool
? In other words, in exchange for the flexibility of starting child processes, what costs do I pay? (If there were no costs, presumably the standardPool
would have used non-daemonic processes). -
Philippe Ombredanne over 7 years@ChrisArndt what would be the license for your code beside the standard CC-CY-SA from SO? MIT?
-
Chris Arndt over 7 years@PhilippeOmbredanne: Yes, MIT is fine. Most code I publish, which doesn't have to be GPL, is under MIT license.
-
Philippe Ombredanne over 7 years@ChrisArndt Thanks... I am going to put to good use in github.com/nexB/scancode-toolkit !
-
Chris Arndt over 6 years@machen Yes, unfortunately that's true. In Python 3.6 the
Pool
class has been extensively refactored, soProcess
isn't a simple attribute anymore, but a method, which returns the process instance it gets from a context. I tried overwriting this method to return aNoDaemonPool
instance, but this results in the exceptionAssertionError: daemonic processes are not allowed to have children
when the Pool is used. -
Chris Arndt about 5 yearsJust tested it again with Python 3.7.3, and, to my surprise, this still works. Rather accidentally, though. Overriding
multiprocessing.pool.Pool.Process
circumvents the whole newcontext
stuff, and I'm not sure what side-effects this has. -
gbmhunter about 5 yearsConfirmed that it works for me in Python 3.6.5 (perhaps the answer was edited to support >=v3.6 after the comment
It doesn't work with python3.6
? -
A_A almost 5 yearsRegarding the caveat: My use case is parallelising tasks, but the grand-children return information to their parents that in turn return information to their parents after doing some required local processing. Consequently, every level / branch has an explicit wait for all its leafs. Does the caveat still apply if you explicitly have to wait for spawned processes to finish?
-
kadee almost 5 yearsIt doesn't work in Python 3.6.7, most probably due to this commit: github.com/python/cpython/commit/…. That commit was reverted already in Python 3.6.8 (github.com/python/cpython/pull/10969). So, I guess, it's only in Python 3.6.7 that this solution is not working?
-
trance_dude over 4 yearsThanks - this helped me a lot - great use of threading here (to spawn processes which actually perform well)
-
Max Wong about 4 yearsI just used the answer, but I received this
AttributeError: module 'multiprocessing' has no attribute 'pool'
error. Why? -
Chris Arndt about 4 years@YanqiHuang Don't forget
import multiprocessing.pool
-
Max Wong about 4 years@ChrisArndt I tried
import multiprocessing
and then usedmultiprocessing.pool.Pool
as the parent class for theMyPool
class. Will there be any difference between the two? Thanks -
Chris Arndt about 4 years@YanqiHuang No, that won't work. You need to
import multiprocessing.pool
, exactly as I wrote. -
Max Wong about 4 years@ChrisArndt Oh... I see Thanks a lot :)
-
Max Wong about 4 years@ChrisArndt Could you also explain why we need the way as you suggested. Thanks a million. This is just our of my curiosity
-
DreamFlasher about 4 yearsThis is now clearly the best solution, as it requires minimal changes.
-
abanana about 4 yearsFor people looking for a practical solution that probably applies to their situation, this is the one.
-
raphael about 4 yearsworks perfectly! ... as a side-note using a child-
multiprocessing.Pool
inside aProcessPoolExecutor.Pool
is also possible! -
Radio Controlled over 3 yearsWould you bother adding how to use this instead of multiprocessing.pool?
-
Radio Controlled over 3 years"You can now use multiprocessing.Pool and NestablePool interchangeably".
-
Roy Shilkrot over 3 yearsUnfortunately this doesn't work for me, still getting
daemonic processes are not allowed to have children
-
Asclepius over 3 years@RoyShilkrot Which version of Python are you using exactly?
-
Roy Shilkrot over 3 yearspython 3.7. The problem was this was run from Celery, and I had to use
import billiard as multiprocessing
and use theirPool
. -
Asclepius over 3 years@RoyShilkrot Noted. The current latest version of Celery claims to support Python 3.8 which is the version of Python that the answer was tested with.
-
wim over 3 yearsUsers choosing a process pool are presumably CPU-bound and/or need cancellable tasks, so threads are not an option. This doesn't really answer the question.
-
Roel over 3 yearsThis won't work if your script is within celery though. Using
celery==4.3.0
andpython==3.7
-
Opps_0 almost 3 years@Acumenus would you mind checking my question (stackoverflow.com/questions/68305077/…)
-
Asclepius almost 3 years@Opps_0 What's to check. Read the docs of
concurrent.futures.ProcessPoolExecutor
, get it to work with something very simple, and then with your actual task.