Why does my Python background process end when SSH session is terminated?
Solution 1
I would disconnect the command from its standard input/output and error flows:
nohup python3 -u <script> </dev/null >/dev/null 2>&1 &
ssh
needs an indicator that doesn't have any more output and that it does not require any more input. Having something else be the input and redirecting the output means ssh
can safely exit, as input/output is not coming from or going to the terminal. This means the input has to come from somewhere else, and the output (both STDOUT and STDERR) should go somewhere else.
The </dev/null
part specifies /dev/null
as the input for <script>
. Why that is useful here:
Redirecting /dev/null to stdin will give an immediate EOF to any read call from that process. This is typically useful to detach a process from a tty (such a process is called a daemon). For example, when starting a background process remotely over ssh, you must redirect stdin to prevent the process waiting for local input. https://stackoverflow.com/questions/19955260/what-is-dev-null-in-bash/19955475#19955475
Alternatively, redirecting from another input source should be relatively safe as long as the current ssh
session doesn't need to be kept open.
With the >/dev/null
part the shell redirects the standard output into /dev/null essentially discarding it. >/path/to/file
will also work.
The last part 2>&1
is redirecting STDERR to STDOUT.
There are three standard sources of input and output for a program. Standard input usually comes from the keyboard if it’s an interactive program, or from another program if it’s processing the other program’s output. The program usually prints to standard output, and sometimes prints to standard error. These three file descriptors (you can think of them as “data pipes”) are often called STDIN, STDOUT, and STDERR.
Sometimes they’re not named, they’re numbered! The built-in numberings for them are 0, 1, and 2, in that order. By default, if you don’t name or number one explicitly, you’re talking about STDOUT.
Given that context, you can see the command above is redirecting standard output into /dev/null, which is a place you can dump anything you don’t want (often called the bit-bucket), then redirecting standard error into standard output (you have to put an & in front of the destination when you do this).
The short explanation, therefore, is “all output from this command should be shoved into a black hole.” That’s one good way to make a program be really quiet!
What does > /dev/null 2>&1 mean? | Xaprb
Solution 2
Look at man ssh
:
ssh [-1246AaCfgKkMNnqsTtVvXxYy] [-b bind_address] [-c cipher_spec] [-D [bind_address:]port] [-e escape_char] [-F configfile] [-I pkcs11] [-i identity_file] [-L [bind_address:]port:host:hostport] [-l login_name] [-m mac_spec] [-O ctl_cmd] [-o option] [-p port] [-R [bind_address:]port:host:hostport] [-S ctl_path] [-W host:port] [-w local_tun[:remote_tun]] [user@]hostname [command]
When you run ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "./startup.sh"
you are running the shell script startup.sh as an ssh command.
From the description:
If command is specified, it is executed on the remote host instead of a login shell.
Based on this, it should be running the script remotely.
The difference between that and running nohup python3 -u <script> &
in your local terminal is that this runs as a local background process while the ssh command attempts to run it as a remote background process.
If you intend to run the script locally then don't run startup.sh as part of the ssh command. You might try something like ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> && "./startup.sh"
If your intention is to run the script remotely and you want this process to continue after your ssh session is terminated, you would have to first start a screen
session on the remote host. Then you have to run the python script within screen and it will continue to run after you end your ssh session.
While I think screen is your best option, if you must use nohup, consider setting shopt -s huponexit
on the remote host before running the nohup command. Alternatively, you can use disown -h [jobID]
to mark the process so SIGHUP will not be sent to it.1
How do I keep running job after I exit from a shell prompt in background?
The SIGHUP (Hangup) signal is used by your system on controlling terminal or death of controlling process. You can use SIGHUP to reload configuration files and open/close log files too. In other words if you logout from your terminal all running jobs will be terminated. To avoid this you can pass the -h option to disown command. This option mark each jobID so that SIGHUP is not sent to the job if the shell receives a SIGHUP.
Also, see this summary of how huponexit
works when a shell is exited, killed or dropped. I'm guessing your current issue is related to how the shell session ends.2
All child processes, backgrounded or not of a shell opened over an ssh connection are killed with SIGHUP when the ssh connection is closed only if the huponexit option is set: run shopt huponexit to see if this is true.
If huponexit is true, then you can use nohup or disown to dissociate the process from the shell so it does not get killed when you exit. Or, run things with screen.
If huponexit is false, which is the default on at least some linuxes these days, then backgrounded jobs will not be killed on normal logout.
- But even if huponexit is false, then if the ssh connection gets killed, or drops (different than normal logout), then backgrounded processes will still get killed. This can be avoided by disown or nohup as in (2).
Finally, here are some examples of how to use shopt huponexit.3
$ shopt -s huponexit; shopt | grep huponexit
huponexit on
# Background jobs will be terminated with SIGHUP when shell exits
$ shopt -u huponexit; shopt | grep huponexit
huponexit off
# Background jobs will NOT be terminated with SIGHUP when shell exits
Solution 3
I suspect you have a race condition. It would go something like this:
- SSH connection starts
- SSH starts startup.sh
- startup.sh starts a background process (nohup)
- startup.sh finishes
- ssh finishes, and this kills the child processes (ie nohup)
If ssh hadn't cut things short, the following would have happened (not sure about the order of these two):
- nohup starts your python script
- nohup disconnects from the parent process and terminal.
So the final two critical steps don't happen, because startup.sh and ssh finish before nohup has time to do its thing.
I expect your problem will go away if you put a few seconds of sleep in the end of startup.sh. I'm not sure exactly how much time you need. If it's important to keep it to a minimum, then maybe you can look at something in proc to see when it's safe.
Solution 4
Maybe worth trying -n
option when starting a ssh
? It will prevent remote process dependency on a local stdin
, which of course closes as soon as ssh session
ends. And this will cause remote prices termination whenever it tries to access its stdin
.
Solution 5
This sounds more like an issue with what the python
script or python
itself is doing. All that nohup
really does (bar simplifying redirects) is just set the handler for the HUP
signal to SIG_IGN
(ignore) before running the program. There is nothing to stop the program setting it back to SIG_DFL
or installing its own handler once it starts running.
One thing that you might want to try is enclosing your command in parenthesis so that you get a double fork effect and your python
script is no longer a child of the shell process. Eg:
( nohup python3 -u <script> & )
Another thing that may be also be worth a try (if you are using bash
and not another shell) is to use the disown
builtin instead of nohup
. If everything is working as documented this shouldn't actually make any difference, but in an interactive shell this would stop the HUP
signal from propagating to your python
script. You can add the disown on the next line or the same one as below (note adding a ;
after a &
is an error in bash
):
python3 -u <script> </dev/null &>/dev/null & disown
If the above or some combination of it doesn't work then surely the only place to address the issue is in the python
script itself.
Related videos on Youtube
neverendingqs
I am adding an "About me" purely for the user card.
Updated on September 18, 2022Comments
-
neverendingqs over 1 year
I have a bash script that starts up a python3 script (let's call it
startup.sh
), with the key line:nohup python3 -u <script> &
When I
ssh
in directly and call this script, the python script continues to run in the background after I exit. However, when I run this:ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "./startup.sh"
The process ends as soon as
ssh
has finished running it and closes the session.What is the difference between the two?
EDIT: The python script is running a web service via Bottle.
EDIT2: I also tried creating an init script that calls
startup.sh
and ranssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "sudo service start <servicename>"
, but got the same behavior.EDIT3: Maybe it's something else in the script. Here's the bulk of the script:
chmod 700 ${key_loc} echo "INFO: Syncing files." rsync -azP -e "ssh -i ${key_loc} -o StrictHostKeyChecking=no" ${source_client_loc} ${remote_user}@${remote_hostname}:${destination_client_loc} echo "INFO: Running startup script." ssh -i ${key_loc} -o StrictHostKeyChecking=no ${remote_user}@${remote_hostname} "cd ${destination_client_loc}; chmod u+x ${ctl_script}; ./${ctl_script} restart"
EDIT4: When I run the last line with a sleep at the end:
ssh -i ${key_loc} -o StrictHostKeyChecking=no ${remote_user}@${remote_hostname} "cd ${destination_client_loc}; chmod u+x ${ctl_script}; ./${ctl_script} restart; sleep 1" echo "Finished"
It never reaches
echo "Finished"
, and I see the Bottle server message, which I never saw before:Bottle vx.x.x server starting up (using WSGIRefServer())... Listening on <URL> Hit Ctrl-C to quit.
I see "Finished" if I manually SSH in and kill the process myself.
EDIT5: Using EDIT4, if I make a request to any endpoint, I get a page back, but the Bottle errors out:
Bottle vx.x.x server starting up (using WSGIRefServer())... Listening on <URL> Hit Ctrl-C to quit. ---------------------------------------- Exception happened during processing of request from ('<IP>', 55104)
-
Bratchley over 9 yearsIs there any way we can get more of a description of what the python script does? You'd probably still just get guesses without the full source code, but knowing more about what the python script does might help us make better educated guesses.
-
neverendingqs over 9 yearsYep - added to the question.
-
Celada over 9 yearsThe script might be doing something early on that somehow depends on the attached terminal or something like that and it could be a timing issue: if the session lasts past the first few seconds it works, otherwise it doesn't. Your best option might be to run it under
strace
if you are using Linux ortruss
if you are running Solaris and see how/why it terminates. Like for examplessh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> strace -fo /tmp/debug ./startup.sh
. -
Jacob Bryan over 9 yearsDid you try using the
&
at the end of the start up script? Adding the&
takes away the dependency of your ssh session from being the parent id (when parent ids die so do their children). Also I think this is a duplicate question based on this previous post. The post I submitted to you in the previous sentence is a duplicate of this post which might provide better detail. -
neverendingqs over 9 yearsI have tried
nohup ./startup.sh &
before, but it had the same behaviour.startup.sh
contains a fork already (nohup python3 -u <script> &
), so I'm pretty sure I don't need to fork again. -
neverendingqs over 9 years@Celada When I use
strace
(ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "strace -fo /tmp/debug ./startup.sh"
) it seems to start up without forking. It hangs while waiting for aCTRL^C
, but it continues running even after I send the signal (which is what I want, without theCTRL^C
-
Celada over 9 years@neverendingqs ahhh,... the
strace
stays in the foreground and prevents the problem from presenting itself, unfortunately. Looks like you'll have to insert thestrace
at a different point: somewhere after the job forks itself into the background. Details would depend on how exactlystartup.sh
is written. -
iyrin over 9 yearsIt seems somewhat implied by your use of nohup, but to confirm, is your python script is intended to be executed on the remote host as a background process (not locally)? I just went ahead and answered for both case scenarios.
-
neverendingqs over 9 years@RyanLoremIpsum it is intended to be executed as a background process on the remote host (not locally).
-
dbailey about 8 yearsThe fundamental problem is the difference between how the remote shell is behaving. When logged in you are using it in interactive mode, which for bash enables job control by default. When you use SSH it's in non-interactive, which does not have job control enabled, so any child processes started will be in the same job group and all get terminated when you exit.
-
-
neverendingqs over 9 yearsBut why is that different than getting a terminal, typing and running the command, and exiting? Both sessions are closed once I close it.
-
neverendingqs over 9 yearsI use
ssh
to run the script manually, so I'm assuming python3 is in the path. -
BillThor over 9 years@neverendingqs Does the logfile contain anything?
-
neverendingqs over 9 yearsNothing out of the ordinary - the start up looks normal.
-
neverendingqs over 9 yearsTried it with no success =[.
-
Avindra Goolcharan over 9 yearsAgree, I would like to understand why this is no different from closing your own terminal manually.
-
Graeme over 9 yearsGood point, don't think the window for this will be very long though - probably only a few milliseconds. You could check
/proc/$!/comm
is notnohup
or more portably use the output ofps -o comm= $!
. -
iyrin over 9 yearsThat should work for normal logout, but what about when session is dropped or killed? Wouldn't you still need to disown the job so it's entirely ignored by sighup?
-
mc0e over 9 years@RyanLoremIpsum: The startup script only needs to wait long enough that the child process is fully detached. After that, it doesn't matter what happens to the ssh session. If something else kills your ssh session in the brief window while that happens, there's not much you can do about it.
-
mc0e over 9 years@Graeme yeah, I presume it's very quick, but I just don't know enough about exactly what nohup does to be sure. A pointer to an authoritative (or at least knowledgeable and detailed) source on this would be useful.
-
Graeme over 9 yearsHow about this one - lingrok.org/xref/coreutils/src/nohup.c
-
Graeme over 9 yearsAll it does is mess around with redirects for a bit and then the meat is just
signal (SIGHUP, SIG_IGN);
and anexecvp
(basically what I described in my answer). The code should be very quick to execute, although there are some calls it could block on so a delay is conceivable. -
neverendingqs over 9 yearsWould
ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "./startup.sh; sleep 5"
work? Or would it have to be inside./startup.sh
? -
neverendingqs over 9 yearsWould the double fork effect be enough (based on @RyanLoremIpsum's answer)?
-
mc0e over 9 yearsYes, that should also work, but given that you have a startup script, why not put it in there? You could test for the presence of the $SSH_CONNECTION environment variable if you don't want to slow down other uses.
-
neverendingqs over 9 yearsJust wanted to take the simpler approach in testing. Added EDIT4 based on the results. Looks like there is a race condition happening(?), but now it looks like the forking isn't working as intended (but it works closing the terminal manually works...)
-
neverendingqs over 9 yearsBoth did not resolve the issue =[. If it's a Python issue, do you have an idea on where to start investigating (can't post too much of the Python script here)?
-
Graeme over 9 years@neverendingqs, if you mean the
huponexit
stuff, running in a subshell should have the same effect asdisown
as the process won't be added to the jobs list. -
Graeme over 9 years@neverendingqs, updated my answer. Forgot that you should use redirects with
disown
. Don't expect that it will make much difference though. I think you best bet is to alter thepython
script so that it tells you why it is exiting. -
neverendingqs over 9 yearsRedirecting the output worked (unix.stackexchange.com/a/176610/52894), but I'm not sure what the difference is between explicitly doing it and getting
nohup
to do it. -
Graeme over 9 years@neverendingqs, what version of
nohup
do you have on your remote host? A POSIXnohup
isn't required to redirectstdin
, which I missed, but it should still redirectstdout
andstderr
. -
neverendingqs over 9 yearsLooks like I'm working with
nohup (GNU coreutils) 8.21
. -
Graeme over 9 years@neverendingqs, does
nohup
print any messages, likenohup: ignoring input and appending output to ‘nohup.out’
? -
neverendingqs over 9 yearsYes - that is the exact message.
-
neverendingqs over 9 years
nohup --help
includes this:NOTE: your shell may have its own version of nohup, which usually supersedes the version described here. Please refer to your shell's documentation for details about the options it supports.
Could I be using another version ofnohup
somewhere? -
neverendingqs over 9 years
whereis nohup
points to the same version ofnohup
. -
neverendingqs over 9 years(Should figuring out the difference between an explicit redirect and
nohup
's redirect be a separate question?) -
jlliagre over 9 yearsDon't use
whereis
orwhich
to identify what command is executed for a given name, use thetype
command instead. -
neverendingqs over 9 yearsI added EDIT5. I'm not sure how to reproduce it in a general case, but I'm suspecting a general cases exists if we can observe a difference.
-
neverendingqs over 9 yearsI found a general case and asked the question here: unix.stackexchange.com/q/176674/52894
-
neverendingqs over 9 yearsThis is the best solution, so I'll mark it as the correct solution. The alternative is to use
-t
withssh
(as per unix.stackexchange.com/q/176674/52894) and add asleep
at the end of the command to preventnohup
from terminating prematurely (as per unix.stackexchange.com/a/176416/52894). However, this is a bit more finicky as it usessleep
. -
iyrin over 9 yearsAdded some explanation to this answer. Is that last redirection of STDERR to STDOUT so that errors from the script are streamed to the terminal?
-
Anthon over 9 years@RyanLoremIpsum redirects are handled by the shell and the script never sees them as arguments.
-
neverendingqs over 9 yearsAdded to the answer to explain why the answer works.
-
DaedalusUsedPerl about 4 yearsWhy is the
-u
option necessary? -
jlliagre about 4 years@DaedalusUsedPerl I don't think it matters a lot. The OP used that option so there was no reason to drop it in my answer.
-
DaedalusUsedPerl about 4 years@jlliagre Thanks for the explanation, it's just that I had a similar issue and using
-u
turned out to be necessary for me, simply redirecting input and output wasn't enough -
jlliagre about 4 years@DaedalusUsedPerl Interesting. To be honest, I answered to this question more than 5 years ago, so maybe did I thought it was needed but forgot why. In any case, the "-u" option is disabling buffering so has an impact on timing.