Python Paramiko timeout with long execution, need full output
Solution 1
Here's something that might help, though I'm still in the midst of testing. After struggling with timeouts of various types including a catch-all timeout for Python, and realizing that the real problem is that the server can't be trusted to terminate the process, I did this:
chan = ssh.get_transport().open_session()
cmd = "timeout {0} {1}\n".format(timeouttime, cmd)
chan.exec_command(cmd)
The server times out after timeouttime
if cmd
doesn't exit sooner, exactly as I'd wish, and the terminated command kills the channel. The only catch is that GNU coreutils must exist on the server. Failing that there are alternatives.
Solution 2
I'm having the same kind of issue. I think we can handle it with signalling. http://docs.python.org/2/library/signal.html
Here is a plain dumb example to show how it works.
import signal, time
def handler(signum, frame):
pass
# Set the signal handler and a 2-second alarm
signal.signal(signal.SIGALRM, handler)
signal.alarm(2)
# This is where your operation that might hang goes
time.sleep(10)
# Disable the alarm
signal.alarm(0)
So here, the alarm is set to 2 seconds. Time.sleep is called with 10 seconds. Of course, the alarm will be triggered before the sleep finishes. If you put some output after the time.sleep, you'll see that program execution resumes there.
If you want the control to continue somewhere else, wrap your hanging call in a try/except and have your handler function raise an exception.
Although I'm pretty sure it would work, I haven't tested it yet over paramiko calls.
user1772459
Updated on June 15, 2022Comments
-
user1772459 almost 2 years
There's lots of topics touching on part of the title, but nothing that quite satisfies the whole thing. I'm pushing a command on a remote server and need the full output after a long execution time, say 5 minutes or so. Using channel I was able to set a timeout, but when I read back stdout I got only a small portion of output. The solution seemed to be to wait for channel.exit_status_ready(). This worked on a successful call, but a failed call would never trigger the channel timeout. Having reviewed the docs, I theorize that's because the timeout only works on a read operation, and waiting for exit status doesn't qualify. Here's that attempt:
channel = ssh.get_transport().open_session() channel.settimeout(timeout) channel.exec_command(cmd) # return on this is not reliable while True: try: if channel.exit_status_ready(): if channel.recv_ready(): # so use recv instead... output = channel.recv(1048576) break if channel.recv_stderr_ready(): # then check error error = channel.recv_stderr(1048576) break except socket.timeout: print("SSH channel timeout exceeded.") break except Exception: traceback.print_exc() break
Pretty, ain't it? Wish it worked.
My first attempt at a solution was to use time.time() to get a start, then check start - time.time() > timeout. This seems straightforward, but in my present version, I output start - time.time() with a fixed timeout that should trigger a break...and see differences that double and triple the timeout with no break occurring. To save space, I'll mention my third attempt, which I've rolled up with this one. I read on here about using select.select to wait for output, and noted in the documentation that there's a timeout there as well. As you'll see from the code below, I've mixed all three methods -- channel timeout, time.time timeout, and select timeout -- yet still have to kill the process. Here's the frankencode:
channel = ssh.get_transport().open_session() channel.settimeout(timeout) channel.exec_command(cmd) # return on this is not reliable print("{0}".format(cmd)) start = time.time() while True: try: rlist, wlist, elist = select([channel], [], [], float(timeout)) print("{0}, {1}, {2}".format(rlist, wlist, elist)) if rlist is not None and len(rlist) > 0: if channel.exit_status_ready(): if channel.recv_ready(): # so use recv instead... output = channel.recv(1048576) break elif elist is not None and len(elist) > 0: if channel.recv_stderr_ready(): # then check error error = channel.recv_stderr(1048576) break print("{0} - {1} = {2}".format( time.time(), start, time.time() - start)) if time.time() - start > timeout: break except socket.timeout: print("SSH channel timeout exceeded.") break except Exception: traceback.print_exc() break
Here's some typical output:
[<paramiko.Channel 3 (open) window=515488 -> <paramiko.Transport at 0x888414cL (cipher aes128-ctr, 128 bits) (active; 1 open channel(s))>>], [], [] 1352494558.42 - 1352494554.69 = 3.73274183273
The top line is [rlist, wlist, elist] from select, the bottom line is time.time() - start = (time.time() - start). I got this run to break by counting the iterations and breaking at the bottom of the try after looping 1000 times. timeout was set to 3 on the sample run. Which proves that we get through the try, but obviously, none of the three ways that should be timing out works.
Feel free to rip into the code if I've fundamentally misunderstood something. I'd like for this to be uber-Pythonic and am still learning.
-
user1772459 over 11 yearsMy research ran in the same direction, but I am getting "ValueError: signal only works in main thread", though I'm not knowingly using threads in my code. Either some module is forking the process or this is a bug. Thoughts?
-
Finch_Powers over 11 yearsYeah I realized too python only supports signals in the main thread. If you get that message, then I guess something spawns threads at some point.
-
Lidia over 8 yearsWhat worked for me was a variation on the above: 'timeout -s SIGKILL <timeout value> <cmd>', otherwise program was not killed.