Python Paramiko timeout with long execution, need full output

14,692

Solution 1

Here's something that might help, though I'm still in the midst of testing. After struggling with timeouts of various types including a catch-all timeout for Python, and realizing that the real problem is that the server can't be trusted to terminate the process, I did this:

chan = ssh.get_transport().open_session()

cmd = "timeout {0} {1}\n".format(timeouttime, cmd)

chan.exec_command(cmd)

The server times out after timeouttime if cmd doesn't exit sooner, exactly as I'd wish, and the terminated command kills the channel. The only catch is that GNU coreutils must exist on the server. Failing that there are alternatives.

Solution 2

I'm having the same kind of issue. I think we can handle it with signalling. http://docs.python.org/2/library/signal.html

Here is a plain dumb example to show how it works.

import signal, time                          

def handler(signum, frame):                  
    pass                                     

# Set the signal handler and a 2-second alarm
signal.signal(signal.SIGALRM, handler)       
signal.alarm(2)                              

# This is where your operation that might hang goes
time.sleep(10)                               

# Disable the alarm                          
signal.alarm(0)                              

So here, the alarm is set to 2 seconds. Time.sleep is called with 10 seconds. Of course, the alarm will be triggered before the sleep finishes. If you put some output after the time.sleep, you'll see that program execution resumes there.

If you want the control to continue somewhere else, wrap your hanging call in a try/except and have your handler function raise an exception.

Although I'm pretty sure it would work, I haven't tested it yet over paramiko calls.

Share:
14,692
user1772459
Author by

user1772459

Updated on June 15, 2022

Comments

  • user1772459
    user1772459 almost 2 years

    There's lots of topics touching on part of the title, but nothing that quite satisfies the whole thing. I'm pushing a command on a remote server and need the full output after a long execution time, say 5 minutes or so. Using channel I was able to set a timeout, but when I read back stdout I got only a small portion of output. The solution seemed to be to wait for channel.exit_status_ready(). This worked on a successful call, but a failed call would never trigger the channel timeout. Having reviewed the docs, I theorize that's because the timeout only works on a read operation, and waiting for exit status doesn't qualify. Here's that attempt:

    channel = ssh.get_transport().open_session()
    channel.settimeout(timeout)
    channel.exec_command(cmd)  # return on this is not reliable
    while True:
        try:
            if channel.exit_status_ready():
                if channel.recv_ready():  # so use recv instead...
                    output = channel.recv(1048576)
                    break
            if channel.recv_stderr_ready():  # then check error
                error = channel.recv_stderr(1048576)
                break
        except socket.timeout:
            print("SSH channel timeout exceeded.")
            break
        except Exception:
            traceback.print_exc()
            break
    

    Pretty, ain't it? Wish it worked.

    My first attempt at a solution was to use time.time() to get a start, then check start - time.time() > timeout. This seems straightforward, but in my present version, I output start - time.time() with a fixed timeout that should trigger a break...and see differences that double and triple the timeout with no break occurring. To save space, I'll mention my third attempt, which I've rolled up with this one. I read on here about using select.select to wait for output, and noted in the documentation that there's a timeout there as well. As you'll see from the code below, I've mixed all three methods -- channel timeout, time.time timeout, and select timeout -- yet still have to kill the process. Here's the frankencode:

    channel = ssh.get_transport().open_session()
    channel.settimeout(timeout)
    channel.exec_command(cmd)  # return on this is not reliable
    print("{0}".format(cmd))
    start = time.time()
    while True:
        try:
            rlist, wlist, elist = select([channel], [], [],
                float(timeout))
            print("{0}, {1}, {2}".format(rlist, wlist, elist))
            if rlist is not None and len(rlist) > 0:
                if channel.exit_status_ready():
                    if channel.recv_ready():  # so use recv instead...
                        output = channel.recv(1048576)
                        break
            elif elist is not None and len(elist) > 0:
                if channel.recv_stderr_ready():  # then check error
                    error = channel.recv_stderr(1048576)
                    break
            print("{0} - {1} = {2}".format(
                time.time(), start, time.time() - start))
            if time.time() - start > timeout:
                break
        except socket.timeout:
            print("SSH channel timeout exceeded.")
            break
        except Exception:
            traceback.print_exc()
            break
    

    Here's some typical output:

    [<paramiko.Channel 3 (open) window=515488 -> <paramiko.Transport at 0x888414cL (cipher aes128-ctr, 128 bits) (active; 1 open channel(s))>>], [], []
    1352494558.42 - 1352494554.69 = 3.73274183273
    

    The top line is [rlist, wlist, elist] from select, the bottom line is time.time() - start = (time.time() - start). I got this run to break by counting the iterations and breaking at the bottom of the try after looping 1000 times. timeout was set to 3 on the sample run. Which proves that we get through the try, but obviously, none of the three ways that should be timing out works.

    Feel free to rip into the code if I've fundamentally misunderstood something. I'd like for this to be uber-Pythonic and am still learning.

  • user1772459
    user1772459 over 11 years
    My research ran in the same direction, but I am getting "ValueError: signal only works in main thread", though I'm not knowingly using threads in my code. Either some module is forking the process or this is a bug. Thoughts?
  • Finch_Powers
    Finch_Powers over 11 years
    Yeah I realized too python only supports signals in the main thread. If you get that message, then I guess something spawns threads at some point.
  • Lidia
    Lidia over 8 years
    What worked for me was a variation on the above: 'timeout -s SIGKILL <timeout value> <cmd>', otherwise program was not killed.