Why does Python's multiprocessing module import __main__ when starting a new process on Windows?
Windows doesn't have fork
, so there's no way to make a new process just like the existing one. So the child process has to run your code again, but now you need a way to distinguish between the parent process and the child process, and __main__
is it.
This is covered in the docs here: http://docs.python.org/2/library/multiprocessing.html#windows
I don't know of another way to structure the code to avoid the fork bomb effect.
Related videos on Youtube
Laura
Updated on June 20, 2022Comments
-
Laura almost 2 years
I am playing around with a library for my beginner students, and I'm using the multiprocessing module in Python. I ran into this problem: importing and using a module that uses multiprocessing without causing infinite loop on Windows
As an example, suppose I have a module
mylibrary.py
:# mylibrary.py from multiprocessing import Process class MyProcess(Process): def run(self): print "Hello from the new process" def foo(): p = MyProcess() p.start()
And a main program that calls this library:
# main.py import mylibrary mylibrary.foo()
If I run
main.py
on Windows, it tries to import main.py into the new process, meaning the code is executed again which results in an infinite loop of process generation. I can fix it like so:import mylibrary if __name__ == "__main__": mylibrary.foo()
But, this is pretty confusing for beginners, and moreover it seems like it shouldn't be necessary. The new process is being created in
mylibrary
, so why doesn't the new process just importmylibrary
? Is there a way to work around this issue without having to changemain.py
?I am using Python 2.7, by the way.
-
Laura over 11 yearsI'm sure I am missing something, but my question is why the child process has to run all of the code again. Why not just the module that started the new process?
-
yantrab over 11 years@Laura: it has to run all your code again, because if it didn't, it wouldn't have your code. The child process starts completely fresh, and if you want it to have your functions, it needs your code.
-
BrenBarn over 11 years@NedBatchelder: That still doesn't really answer the question. In the posted example, the new process only needs the
run
function. Why does it have to import other modules just to run that one function? -
Laura over 11 years@NedBatchelder: But my point is that the function I asked it to execute is in mylibrary.py, so that is all it needs to import.
-
Jeff Mercado over 11 years@Laura: The point is, there is a context in which that module is supposed to be invoked, the context in which the main script is in. Since Windows doesn't have fork, it can't just copy the context into the new process, it has to invoke the main script so it can set it up again. This is why you put the actual program code within the
if __name__ == "__main__":
block. Everything leading up to it is just setting up the environment (all the imports, function definitions, etc.). -
Piotr Dobrogost almost 9 years@Laura To answer why the child process has to run all of the code again – when you do
import mylibrary
mylibrary becomes part of global namespace (shared with main.py) thus when you subsequently callmylibrary.foo()
then foo runs in this namespace which can influence its behavior. In order to make sure foo runs in the subprocess exactly the same way it would run in the main process you have to run it in the exact same namespace (environment) and the only way to make it so is to run everything from the beginning, starting with execution of main.py.