Keyboard Interrupts with python's multiprocessing Pool

87,154

Solution 1

This is a Python bug. When waiting for a condition in threading.Condition.wait(), KeyboardInterrupt is never sent. Repro:

import threading
cond = threading.Condition(threading.Lock())
cond.acquire()
cond.wait(None)
print "done"

The KeyboardInterrupt exception won't be delivered until wait() returns, and it never returns, so the interrupt never happens. KeyboardInterrupt should almost certainly interrupt a condition wait.

Note that this doesn't happen if a timeout is specified; cond.wait(1) will receive the interrupt immediately. So, a workaround is to specify a timeout. To do that, replace

    results = pool.map(slowly_square, range(40))

with

    results = pool.map_async(slowly_square, range(40)).get(9999999)

or similar.

Solution 2

From what I have recently found, the best solution is to set up the worker processes to ignore SIGINT altogether, and confine all the cleanup code to the parent process. This fixes the problem for both idle and busy worker processes, and requires no error handling code in your child processes.

import signal

...

def init_worker():
    signal.signal(signal.SIGINT, signal.SIG_IGN)

...

def main()
    pool = multiprocessing.Pool(size, init_worker)

    ...

    except KeyboardInterrupt:
        pool.terminate()
        pool.join()

Explanation and full example code can be found at http://noswap.com/blog/python-multiprocessing-keyboardinterrupt/ and http://github.com/jreese/multiprocessing-keyboardinterrupt respectively.

Solution 3

For some reasons, only exceptions inherited from the base Exception class are handled normally. As a workaround, you may re-raise your KeyboardInterrupt as an Exception instance:

from multiprocessing import Pool
import time

class KeyboardInterruptError(Exception): pass

def f(x):
    try:
        time.sleep(x)
        return x
    except KeyboardInterrupt:
        raise KeyboardInterruptError()

def main():
    p = Pool(processes=4)
    try:
        print 'starting the pool map'
        print p.map(f, range(10))
        p.close()
        print 'pool map complete'
    except KeyboardInterrupt:
        print 'got ^C while pool mapping, terminating the pool'
        p.terminate()
        print 'pool is terminated'
    except Exception, e:
        print 'got exception: %r, terminating the pool' % (e,)
        p.terminate()
        print 'pool is terminated'
    finally:
        print 'joining pool processes'
        p.join()
        print 'join complete'
    print 'the end'

if __name__ == '__main__':
    main()

Normally you would get the following output:

staring the pool map
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
pool map complete
joining pool processes
join complete
the end

So if you hit ^C, you will get:

staring the pool map
got ^C while pool mapping, terminating the pool
pool is terminated
joining pool processes
join complete
the end

Solution 4

The voted answer does not tackle the core issue but a similar side effect.

Jesse Noller, the author of the multiprocessing library, explains how to correctly deal with CTRL+C when using multiprocessing.Pool in a old blog post.

import signal
from multiprocessing import Pool


def initializer():
    """Ignore CTRL+C in the worker process."""
    signal.signal(signal.SIGINT, signal.SIG_IGN)


pool = Pool(initializer=initializer)

try:
    pool.map(perform_download, dowloads)
except KeyboardInterrupt:
    pool.terminate()
    pool.join()

Solution 5

Usually this simple structure works for Ctrl-C on Pool :

def signal_handle(_signal, frame):
    print "Stopping the Jobs."

signal.signal(signal.SIGINT, signal_handle)

As was stated in few similar posts:

Capture keyboardinterrupt in Python without try-except

Share:
87,154

Related videos on Youtube

Fragsworth
Author by

Fragsworth

Developer of Clicker Heroes, Cloudstone, and other games http://www.clickerheroes.com/ http://www.kongregate.com/games/nexoncls/cloudstone http://armorgames.com/cloudstone-game/15364

Updated on August 30, 2021

Comments

  • Fragsworth
    Fragsworth almost 3 years

    How can I handle KeyboardInterrupt events with python's multiprocessing Pools? Here is a simple example:

    from multiprocessing import Pool
    from time import sleep
    from sys import exit
    
    def slowly_square(i):
        sleep(1)
        return i*i
    
    def go():
        pool = Pool(8)
        try:
            results = pool.map(slowly_square, range(40))
        except KeyboardInterrupt:
            # **** THIS PART NEVER EXECUTES. ****
            pool.terminate()
            print "You cancelled the program!"
            sys.exit(1)
        print "\nFinally, here are the results: ", results
    
    if __name__ == "__main__":
        go()
    

    When running the code above, the KeyboardInterrupt gets raised when I press ^C, but the process simply hangs at that point and I have to kill it externally.

    I want to be able to press ^C at any time and cause all of the processes to exit gracefully.

  • Fragsworth
    Fragsworth almost 15 years
    I tried this, and it doesn't actually terminate the entire set of jobs. It terminates the currently-running jobs, but the script still assigns the remaining jobs in the pool.map call as if everything is normal.
  • Joseph Garvin
    Joseph Garvin over 14 years
    Is this bug in the official python tracker anywhere? I'm having trouble finding it but I'm probably just not using the best search terms.
  • Andrey Vlasovskikh
    Andrey Vlasovskikh about 14 years
    It seems that this is not a complete solution. If a KeyboardInterrupt is arrived while multiprocessing is performing its own IPC data exchange then the try..catch will not be activated (obviously).
  • Andrey Vlasovskikh
    Andrey Vlasovskikh about 14 years
    This bug has been filed as [Issue 8296][1]. [1]: bugs.python.org/issue8296
  • Alexander Ljungberg
    Alexander Ljungberg over 13 years
    Here's a hack which fixes pool.imap() in the same manner, making Ctrl-C possible when iterating over imap. Catch the exception and call pool.terminate() and your program will exit. gist.github.com/626518
  • Ryan C. Thompson
    Ryan C. Thompson over 12 years
    This doesn't quite fix things. Sometimes I get the expected behavior when I press Control+C, other times not. I'm not sure why, but it looks like maybe The KeyboardInterrupt is received by one of the processes at random, and I only get the correct behavior if the parent process is the one that catches it.
  • bboe
    bboe over 12 years
    Hi John. Your solution doesn't accomplish the same thing as my, yes unfortunately complicated, solution. It hides behind the time.sleep(10) in the main process. If you were to remove that sleep, or if you wait until the process attempts to join on the pool, which you have to do in order to guarantee the jobs are complete, then you still suffer from the same problem which is the main process doesn't receive the KeyboardInterrupt while it it waiting on a the poll join operation.
  • jreese
    jreese over 12 years
    In the case of where I used this code in production, the time.sleep() was part of a loop that would check the status of each child process, and then restart certain processes on a delay if necessary. Rather than join() that would wait on all processes to complete, it would check on them individually, ensuring that the master process stayed responsive.
  • bboe
    bboe over 12 years
    So it was more a busy wait (maybe with small sleeps between checks) that polled for process completion via another method rather than join? If that's the case, perhaps it would be better to include this code in your blog post, since you can then guarantee that all the workers have completed before attempting to join.
  • MarioVilas
    MarioVilas about 11 years
    This would have to be done on each of the worker processes as well, and may still fail if the KeyboardInterrupt is raised while the multiprocessing library is initializing.
  • Walter
    Walter about 11 years
    The trick with .get(999999) slows everything down somehow. See below for the link to bryceboe.com with a solution that works.
  • Walter
    Walter about 11 years
    Works like a charm. It's a clean solution and not some kind of hack (/me thinks).btw, the trick with .get(99999) as proposed by others hurts performance badly.
  • krethika
    krethika over 10 years
    this is OK, but yuo may lose track of errors that occur. returning the error with a stacktrace might work so the parent process can tell that an error occurred, but it still doesn't exit immediately when the error occurs.
  • Paul Price
    Paul Price about 10 years
    I've not noticed any performance penalty from using a timeout, though I have been using 9999 instead of 999999. The exception is when an exception that doesn't inherit from the Exception class is raised: then you have to wait until the timeout is hit. The solution to that is to catch all exceptions (see my solution).
  • Cerin
    Cerin about 10 years
    This doesn't work. Only the children are sent the signal. The parent never receives it, so pool.terminate() never gets executed. Having the children ignore the signal accomplishes nothing. @Glenn's answer solves the problem.
  • Andy MacKinlay
    Andy MacKinlay almost 10 years
    My version of this is at gist.github.com/admackin/003dd646e5fadee8b8d6 ; it doesn't call .join() except on interrupt - it simply manually checks the result of .apply_async() using AsyncResult.ready() to see if it is ready, meaning we've cleanly finished.
  • Paul Price
    Paul Price over 9 years
    I've not noticed any performance penalty, but in my case the function is fairly long-lived (hundreds of seconds).
  • Ant6n
    Ant6n about 9 years
    I've tried to use this work around - and the keyboard interrupt does give me back control of the REPL. But the other spawned processes in the background are not properly terminated; they seem to randomly re-run somehow.
  • trcarden
    trcarden over 8 years
    @Cerin I was trying to confirm that this solution breaks down somewhere and I found this win.tue.nl/~aeb/linux/lk/lk-10.html#ss10.2. I believe that if the signal is sent to the process group then the signal will be sent to the leader as well as the children. If so then ignoring the signal in all but the leader would make for a pretty nice solution.
  • Bernhard
    Bernhard over 8 years
    You could replace raise KeyboardInterruptError with a return. You just have to make sure that the child process ends as soon as KeyboardInterrupt is received. The return value seems to be ignored, in main still the KeyboardInterrupt is received.
  • gaborous
    gaborous over 7 years
    It works for me, you just have to make sure to put the signal ignoring code only in children's initialization...
  • szx
    szx almost 7 years
    This doesn't work for me with Python 3.6.1 on Windows. I get tons of stack traces and other garbage when I do Ctrl-C, i.e. same as without such workaround. In fact none of the solutions I've tried from this thread seem to work...
  • benathon
    benathon almost 7 years
    I've found that ProcessPoolExecutor also has the same issue. The only fix I was able to find was to call os.setpgrp() from inside the future
  • noxdafox
    noxdafox almost 7 years
    Sure, the only difference is that ProcessPoolExecutor does not support initializer functions. On Unix, you could leverage the fork strategy by disabling the sighandler on the main process before creating the Pool and re-enabling it afterwards. In pebble, I silence SIGINT on the child processes by default. I am not aware of the reason they don't do the same with the Python Pools. At the end, the user could re-set the SIGINT handler in case he/she wants to hurt himself/herself.
  • Paul Price
    Paul Price over 6 years
    This solution seems to prevent Ctrl-C from interrupting the main process as well.
  • noxdafox
    noxdafox over 6 years
    I just tested on Python 3.5 and it works, what version of Python are you using? What OS?
  • Code Doggo
    Code Doggo about 6 years
    I just figured this out as well! I honestly think this is the best solution for a problem like this. The accepted solution forces map_async onto the user, which I don't particularly like. In many situations, like mine, the main thread needs to wait for the individual processes to finish. This is one of the reasons why map exists!
  • Code Doggo
    Code Doggo about 6 years
    This actually isn't the case anymore, at least from my eyes and experience. If you catch the keyboard exception in the individual child processes and catch it once more in the main process, then you can continue using map and all is good. @Linux Cli Aik provided a solution below that produces this behavior. Using map_async is not always desired if the main thread is depended on the results from the child processes.
  • Akos Lukacs
    Akos Lukacs almost 5 years
    Jehejj, it's still not fixed in 2019. Like doing IO in paralel is a novel idea :/
  • eMTy
    eMTy almost 4 years
    Glorious and complete example
  • Thomas
    Thomas almost 4 years
    It's tracked here now: bugs.python.org/issue22393
  • michaelvdnest
    michaelvdnest almost 4 years
    Excellent example.
  • Raf
    Raf almost 4 years
    Hi from 2020 ... this works nicely for imap_unordered as well.
  • amball
    amball over 3 years
    Thank you. I'm trying to figure out how this generalizes to multiple arguments. In particular, why do you pass [value] rather than value in jobs[value] = pool.apply_async(input_function, [value])?
  • Bruce Lamond
    Bruce Lamond almost 3 years
    Confirmed this works as expected on Python 3.7.7 on Windows. Thanks for posting!
  • 2080
    2080 over 2 years
    Would it be possible to have interrupted processes return an intermediate result instead?