Why does Python's multiprocessing module import __main__ when starting a new process on Windows?

13,769

Windows doesn't have fork, so there's no way to make a new process just like the existing one. So the child process has to run your code again, but now you need a way to distinguish between the parent process and the child process, and __main__ is it.

This is covered in the docs here: http://docs.python.org/2/library/multiprocessing.html#windows

I don't know of another way to structure the code to avoid the fork bomb effect.

Share:
13,769

Related videos on Youtube

Laura
Author by

Laura

Updated on June 20, 2022

Comments

  • Laura
    Laura almost 2 years

    I am playing around with a library for my beginner students, and I'm using the multiprocessing module in Python. I ran into this problem: importing and using a module that uses multiprocessing without causing infinite loop on Windows

    As an example, suppose I have a module mylibrary.py:

    # mylibrary.py
    
    from multiprocessing import Process
    
    class MyProcess(Process):
        def run(self):
            print "Hello from the new process"
    
    def foo():
        p = MyProcess()
        p.start()
    

    And a main program that calls this library:

    # main.py
    
    import mylibrary
    
    mylibrary.foo()
    

    If I run main.py on Windows, it tries to import main.py into the new process, meaning the code is executed again which results in an infinite loop of process generation. I can fix it like so:

    import mylibrary
    
    if __name__ == "__main__":
        mylibrary.foo()
    

    But, this is pretty confusing for beginners, and moreover it seems like it shouldn't be necessary. The new process is being created in mylibrary, so why doesn't the new process just import mylibrary? Is there a way to work around this issue without having to change main.py?

    I am using Python 2.7, by the way.

  • Laura
    Laura over 11 years
    I'm sure I am missing something, but my question is why the child process has to run all of the code again. Why not just the module that started the new process?
  • yantrab
    yantrab over 11 years
    @Laura: it has to run all your code again, because if it didn't, it wouldn't have your code. The child process starts completely fresh, and if you want it to have your functions, it needs your code.
  • BrenBarn
    BrenBarn over 11 years
    @NedBatchelder: That still doesn't really answer the question. In the posted example, the new process only needs the run function. Why does it have to import other modules just to run that one function?
  • Laura
    Laura over 11 years
    @NedBatchelder: But my point is that the function I asked it to execute is in mylibrary.py, so that is all it needs to import.
  • Jeff Mercado
    Jeff Mercado over 11 years
    @Laura: The point is, there is a context in which that module is supposed to be invoked, the context in which the main script is in. Since Windows doesn't have fork, it can't just copy the context into the new process, it has to invoke the main script so it can set it up again. This is why you put the actual program code within the if __name__ == "__main__": block. Everything leading up to it is just setting up the environment (all the imports, function definitions, etc.).
  • Piotr Dobrogost
    Piotr Dobrogost almost 9 years
    @Laura To answer why the child process has to run all of the code again – when you do import mylibrary mylibrary becomes part of global namespace (shared with main.py) thus when you subsequently call mylibrary.foo() then foo runs in this namespace which can influence its behavior. In order to make sure foo runs in the subprocess exactly the same way it would run in the main process you have to run it in the exact same namespace (environment) and the only way to make it so is to run everything from the beginning, starting with execution of main.py.