Why does my Python background process end when SSH session is terminated?

bash ssh python background-process

30,238

Solution 1

I would disconnect the command from its standard input/output and error flows:

nohup python3 -u <script> </dev/null >/dev/null 2>&1 &

ssh needs an indicator that doesn't have any more output and that it does not require any more input. Having something else be the input and redirecting the output means ssh can safely exit, as input/output is not coming from or going to the terminal. This means the input has to come from somewhere else, and the output (both STDOUT and STDERR) should go somewhere else.

The </dev/null part specifies /dev/null as the input for <script>. Why that is useful here:

Redirecting /dev/null to stdin will give an immediate EOF to any read call from that process. This is typically useful to detach a process from a tty (such a process is called a daemon). For example, when starting a background process remotely over ssh, you must redirect stdin to prevent the process waiting for local input. https://stackoverflow.com/questions/19955260/what-is-dev-null-in-bash/19955475#19955475

Alternatively, redirecting from another input source should be relatively safe as long as the current ssh session doesn't need to be kept open.

With the >/dev/null part the shell redirects the standard output into /dev/null essentially discarding it. >/path/to/file will also work.

The last part 2>&1 is redirecting STDERR to STDOUT.

There are three standard sources of input and output for a program. Standard input usually comes from the keyboard if it’s an interactive program, or from another program if it’s processing the other program’s output. The program usually prints to standard output, and sometimes prints to standard error. These three file descriptors (you can think of them as “data pipes”) are often called STDIN, STDOUT, and STDERR.

Sometimes they’re not named, they’re numbered! The built-in numberings for them are 0, 1, and 2, in that order. By default, if you don’t name or number one explicitly, you’re talking about STDOUT.

Given that context, you can see the command above is redirecting standard output into /dev/null, which is a place you can dump anything you don’t want (often called the bit-bucket), then redirecting standard error into standard output (you have to put an & in front of the destination when you do this).

The short explanation, therefore, is “all output from this command should be shoved into a black hole.” That’s one good way to make a program be really quiet!
What does > /dev/null 2>&1 mean? | Xaprb

Solution 2

Look at man ssh:

 ssh [-1246AaCfgKkMNnqsTtVvXxYy] [-b bind_address] [-c cipher_spec] [-D [bind_address:]port]
     [-e escape_char] [-F configfile] [-I pkcs11] [-i identity_file] [-L [bind_address:]port:host:hostport]
     [-l login_name] [-m mac_spec] [-O ctl_cmd] [-o option] [-p port]
     [-R [bind_address:]port:host:hostport] [-S ctl_path] [-W host:port] [-w local_tun[:remote_tun]]
     [user@]hostname [command]

When you run ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "./startup.sh" you are running the shell script startup.sh as an ssh command.

From the description:

If command is specified, it is executed on the remote host instead of a login shell.

Based on this, it should be running the script remotely.

The difference between that and running nohup python3 -u <script> & in your local terminal is that this runs as a local background process while the ssh command attempts to run it as a remote background process.

If you intend to run the script locally then don't run startup.sh as part of the ssh command. You might try something like ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> && "./startup.sh"

If your intention is to run the script remotely and you want this process to continue after your ssh session is terminated, you would have to first start a screen session on the remote host. Then you have to run the python script within screen and it will continue to run after you end your ssh session.

See Screen User's Manual

While I think screen is your best option, if you must use nohup, consider setting shopt -s huponexit on the remote host before running the nohup command. Alternatively, you can use disown -h [jobID] to mark the process so SIGHUP will not be sent to it.1

How do I keep running job after I exit from a shell prompt in background?

The SIGHUP (Hangup) signal is used by your system on controlling terminal or death of controlling process. You can use SIGHUP to reload configuration files and open/close log files too. In other words if you logout from your terminal all running jobs will be terminated. To avoid this you can pass the -h option to disown command. This option mark each jobID so that SIGHUP is not sent to the job if the shell receives a SIGHUP.

Also, see this summary of how huponexit works when a shell is exited, killed or dropped. I'm guessing your current issue is related to how the shell session ends.2

All child processes, backgrounded or not of a shell opened over an ssh connection are killed with SIGHUP when the ssh connection is closed only if the huponexit option is set: run shopt huponexit to see if this is true.

If huponexit is true, then you can use nohup or disown to dissociate the process from the shell so it does not get killed when you exit. Or, run things with screen.

If huponexit is false, which is the default on at least some linuxes these days, then backgrounded jobs will not be killed on normal logout.

But even if huponexit is false, then if the ssh connection gets killed, or drops (different than normal logout), then backgrounded processes will still get killed. This can be avoided by disown or nohup as in (2).

Finally, here are some examples of how to use shopt huponexit.3

$ shopt -s huponexit; shopt | grep huponexit
huponexit       on
# Background jobs will be terminated with SIGHUP when shell exits

$ shopt -u huponexit; shopt | grep huponexit
huponexit       off
# Background jobs will NOT be terminated with SIGHUP when shell exits

Solution 3

I suspect you have a race condition. It would go something like this:

SSH connection starts
SSH starts startup.sh
startup.sh starts a background process (nohup)
startup.sh finishes
ssh finishes, and this kills the child processes (ie nohup)

If ssh hadn't cut things short, the following would have happened (not sure about the order of these two):

nohup starts your python script
nohup disconnects from the parent process and terminal.

So the final two critical steps don't happen, because startup.sh and ssh finish before nohup has time to do its thing.

I expect your problem will go away if you put a few seconds of sleep in the end of startup.sh. I'm not sure exactly how much time you need. If it's important to keep it to a minimum, then maybe you can look at something in proc to see when it's safe.

Solution 4

Maybe worth trying -n option when starting a ssh? It will prevent remote process dependency on a local stdin, which of course closes as soon as ssh session ends. And this will cause remote prices termination whenever it tries to access its stdin.

Solution 5

This sounds more like an issue with what the python script or python itself is doing. All that nohup really does (bar simplifying redirects) is just set the handler for the HUP signal to SIG_IGN (ignore) before running the program. There is nothing to stop the program setting it back to SIG_DFL or installing its own handler once it starts running.

One thing that you might want to try is enclosing your command in parenthesis so that you get a double fork effect and your python script is no longer a child of the shell process. Eg:

( nohup python3 -u <script> & )

Another thing that may be also be worth a try (if you are using bash and not another shell) is to use the disown builtin instead of nohup. If everything is working as documented this shouldn't actually make any difference, but in an interactive shell this would stop the HUP signal from propagating to your python script. You can add the disown on the next line or the same one as below (note adding a ; after a & is an error in bash):

python3 -u <script> </dev/null &>/dev/null & disown

If the above or some combination of it doesn't work then surely the only place to address the issue is in the python script itself.

View more solutions

30,238

neverendingqs

I am adding an "About me" purely for the user card.

Updated on September 18, 2022

Comments

neverendingqs over 1 year
I have a bash script that starts up a python3 script (let's call it startup.sh), with the key line:
```
nohup python3 -u <script> &
```
When I ssh in directly and call this script, the python script continues to run in the background after I exit. However, when I run this:
```
ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "./startup.sh"
```
The process ends as soon as ssh has finished running it and closes the session.

What is the difference between the two?

EDIT: The python script is running a web service via Bottle.

EDIT2: I also tried creating an init script that calls startup.sh and ran ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "sudo service start <servicename>", but got the same behavior.

EDIT3: Maybe it's something else in the script. Here's the bulk of the script:
```
chmod 700 ${key_loc}

echo "INFO: Syncing files."
rsync -azP -e "ssh -i ${key_loc} -o StrictHostKeyChecking=no" ${source_client_loc} ${remote_user}@${remote_hostname}:${destination_client_loc}

echo "INFO: Running startup script."
ssh -i ${key_loc} -o StrictHostKeyChecking=no ${remote_user}@${remote_hostname} "cd ${destination_client_loc}; chmod u+x ${ctl_script}; ./${ctl_script} restart"
```
EDIT4: When I run the last line with a sleep at the end:
```
ssh -i ${key_loc} -o StrictHostKeyChecking=no ${remote_user}@${remote_hostname} "cd ${destination_client_loc}; chmod u+x ${ctl_script}; ./${ctl_script} restart; sleep 1"

echo "Finished"
```
It never reaches echo "Finished", and I see the Bottle server message, which I never saw before:
```
Bottle vx.x.x server starting up (using WSGIRefServer())...
Listening on <URL>
Hit Ctrl-C to quit.
```
I see "Finished" if I manually SSH in and kill the process myself.

EDIT5: Using EDIT4, if I make a request to any endpoint, I get a page back, but the Bottle errors out:
```
Bottle vx.x.x server starting up (using WSGIRefServer())...
Listening on <URL>
Hit Ctrl-C to quit.


----------------------------------------
Exception happened during processing of request from ('<IP>', 55104)
```
- Bratchley over 9 years
  
  Is there any way we can get more of a description of what the python script does? You'd probably still just get guesses without the full source code, but knowing more about what the python script does might help us make better educated guesses.
- neverendingqs over 9 years
  
  Yep - added to the question.
- Celada over 9 years
  
  The script might be doing something early on that somehow depends on the attached terminal or something like that and it could be a timing issue: if the session lasts past the first few seconds it works, otherwise it doesn't. Your best option might be to run it under strace if you are using Linux or truss if you are running Solaris and see how/why it terminates. Like for example ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> strace -fo /tmp/debug ./startup.sh.
- Jacob Bryan over 9 years
  
  Did you try using the & at the end of the start up script? Adding the & takes away the dependency of your ssh session from being the parent id (when parent ids die so do their children). Also I think this is a duplicate question based on this previous post. The post I submitted to you in the previous sentence is a duplicate of this post which might provide better detail.
- neverendingqs over 9 years
  
  I have tried nohup ./startup.sh & before, but it had the same behaviour. startup.sh contains a fork already (nohup python3 -u <script> &), so I'm pretty sure I don't need to fork again.
- neverendingqs over 9 years
  
  @Celada When I use strace (ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "strace -fo /tmp/debug ./startup.sh") it seems to start up without forking. It hangs while waiting for a CTRL^C, but it continues running even after I send the signal (which is what I want, without the CTRL^C
- Celada over 9 years
  
  @neverendingqs ahhh,... the strace stays in the foreground and prevents the problem from presenting itself, unfortunately. Looks like you'll have to insert the strace at a different point: somewhere after the job forks itself into the background. Details would depend on how exactly startup.sh is written.
- iyrin over 9 years
  
  It seems somewhat implied by your use of nohup, but to confirm, is your python script is intended to be executed on the remote host as a background process (not locally)? I just went ahead and answered for both case scenarios.
- neverendingqs over 9 years
  
  @RyanLoremIpsum it is intended to be executed as a background process on the remote host (not locally).
- dbailey about 8 years
  
  The fundamental problem is the difference between how the remote shell is behaving. When logged in you are using it in interactive mode, which for bash enables job control by default. When you use SSH it's in non-interactive, which does not have job control enabled, so any child processes started will be in the same job group and all get terminated when you exit.
neverendingqs over 9 years

But why is that different than getting a terminal, typing and running the command, and exiting? Both sessions are closed once I close it.
neverendingqs over 9 years

I use ssh to run the script manually, so I'm assuming python3 is in the path.
BillThor over 9 years

@neverendingqs Does the logfile contain anything?
neverendingqs over 9 years

Nothing out of the ordinary - the start up looks normal.
neverendingqs over 9 years

Tried it with no success =[.
Avindra Goolcharan over 9 years

Agree, I would like to understand why this is no different from closing your own terminal manually.
Graeme over 9 years

Good point, don't think the window for this will be very long though - probably only a few milliseconds. You could check /proc/$!/comm is not nohup or more portably use the output of ps -o comm= $!.
iyrin over 9 years

That should work for normal logout, but what about when session is dropped or killed? Wouldn't you still need to disown the job so it's entirely ignored by sighup?
mc0e over 9 years

@RyanLoremIpsum: The startup script only needs to wait long enough that the child process is fully detached. After that, it doesn't matter what happens to the ssh session. If something else kills your ssh session in the brief window while that happens, there's not much you can do about it.
mc0e over 9 years

@Graeme yeah, I presume it's very quick, but I just don't know enough about exactly what nohup does to be sure. A pointer to an authoritative (or at least knowledgeable and detailed) source on this would be useful.
Graeme over 9 years

How about this one - lingrok.org/xref/coreutils/src/nohup.c
Graeme over 9 years

All it does is mess around with redirects for a bit and then the meat is just signal (SIGHUP, SIG_IGN); and an execvp (basically what I described in my answer). The code should be very quick to execute, although there are some calls it could block on so a delay is conceivable.
neverendingqs over 9 years

Would ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "./startup.sh; sleep 5" work? Or would it have to be inside ./startup.sh?
neverendingqs over 9 years

Would the double fork effect be enough (based on @RyanLoremIpsum's answer)?
mc0e over 9 years

Yes, that should also work, but given that you have a startup script, why not put it in there? You could test for the presence of the $SSH_CONNECTION environment variable if you don't want to slow down other uses.
neverendingqs over 9 years

Just wanted to take the simpler approach in testing. Added EDIT4 based on the results. Looks like there is a race condition happening(?), but now it looks like the forking isn't working as intended (but it works closing the terminal manually works...)
neverendingqs over 9 years

Both did not resolve the issue =[. If it's a Python issue, do you have an idea on where to start investigating (can't post too much of the Python script here)?
Graeme over 9 years

@neverendingqs, if you mean the huponexit stuff, running in a subshell should have the same effect as disown as the process won't be added to the jobs list.
Graeme over 9 years

@neverendingqs, updated my answer. Forgot that you should use redirects with disown. Don't expect that it will make much difference though. I think you best bet is to alter the python script so that it tells you why it is exiting.
neverendingqs over 9 years

Redirecting the output worked (unix.stackexchange.com/a/176610/52894), but I'm not sure what the difference is between explicitly doing it and getting nohup to do it.
Graeme over 9 years

@neverendingqs, what version of nohup do you have on your remote host? A POSIX nohup isn't required to redirect stdin, which I missed, but it should still redirect stdout and stderr.
neverendingqs over 9 years

Looks like I'm working with nohup (GNU coreutils) 8.21.
Graeme over 9 years

@neverendingqs, does nohup print any messages, like nohup: ignoring input and appending output to ‘nohup.out’?
neverendingqs over 9 years

Yes - that is the exact message.
neverendingqs over 9 years

nohup --help includes this: NOTE: your shell may have its own version of nohup, which usually supersedes the version described here. Please refer to your shell's documentation for details about the options it supports. Could I be using another version of nohup somewhere?
neverendingqs over 9 years

whereis nohup points to the same version of nohup.
neverendingqs over 9 years

(Should figuring out the difference between an explicit redirect and nohup's redirect be a separate question?)
jlliagre over 9 years

Don't use whereis or which to identify what command is executed for a given name, use the type command instead.
neverendingqs over 9 years

I added EDIT5. I'm not sure how to reproduce it in a general case, but I'm suspecting a general cases exists if we can observe a difference.
neverendingqs over 9 years

I found a general case and asked the question here: unix.stackexchange.com/q/176674/52894
neverendingqs over 9 years

This is the best solution, so I'll mark it as the correct solution. The alternative is to use -t with ssh (as per unix.stackexchange.com/q/176674/52894) and add a sleep at the end of the command to prevent nohup from terminating prematurely (as per unix.stackexchange.com/a/176416/52894). However, this is a bit more finicky as it uses sleep.
iyrin over 9 years

Added some explanation to this answer. Is that last redirection of STDERR to STDOUT so that errors from the script are streamed to the terminal?
Anthon over 9 years

@RyanLoremIpsum redirects are handled by the shell and the script never sees them as arguments.
neverendingqs over 9 years

Added to the answer to explain why the answer works.
DaedalusUsedPerl about 4 years

Why is the -u option necessary?
jlliagre about 4 years

@DaedalusUsedPerl I don't think it matters a lot. The OP used that option so there was no reason to drop it in my answer.
DaedalusUsedPerl about 4 years

@jlliagre Thanks for the explanation, it's just that I had a similar issue and using -u turned out to be necessary for me, simply redirecting input and output wasn't enough
jlliagre about 4 years

@DaedalusUsedPerl Interesting. To be honest, I answered to this question more than 5 years ago, so maybe did I thought it was needed but forgot why. In any case, the "-u" option is disabling buffering so has an impact on timing.