Automatically restart a Unix job if it goes down?

10,843

Solution 1

This approach is fast and cheap and not bulletproof:

#!/usr/bin/perl -w
$l = `ps x`;
if (not $l =~ /mzscheme/) {
        system('~/utils/src/plt/bin/mzscheme &');
}

I put that script in a cron file.

Solution 2

If your program runs in the foreground, use Gerrit Pape's runit. Advantages:

  • Its pretty well bullet proof (based on Dan Berstein's daemontools).
  • It runs on a wide variety of platforms (portable).
  • It is packaged on Ubuntu and Debian (along w/ above..).
  • It is relatively easy to configure (run script, log script, some symlinks).

Solution 3

I use Monit for this purpose, it's free and open source. It does what you need and so much more.

What Monit can do

Monit can start a process if it does not run, restart a process if it does not respond and stop a process if it uses too much resources. You can use Monit to monitor files, directories and filesystems for changes, such as timestamp changes, checksum changes or size changes. You can also monitor remote hosts; Monit can ping a remote host and can check TCP/IP port connections and server protocols. Monit is controlled via an easy to use control file based on a free-format, token-oriented syntax. Monit logs to syslog or to its own log file and notifies you about error conditions and recovery status via customizable alert

I also like their design philosophy:

It is important for a system monitoring tool to just work - all the time and you should be able to trust it to do so. A system monitoring tool need to be non-intrusive and you should be able to forget about it once it's installed. That is, until sshd dies on your co-located server, 50 miles away. When this happens, it is good to know that you have installed this extra layer of security and protection - just wait a few seconds and Monit will restart the sshd daemon. It is also helpful to get an alert mail before the server disks are full or if your http server suddenly is slashdotted.

Monit is designed as an autonomous system and does not depend on plugins nor any special libraries to run. Instead it works right out of the box and can utilize existing infrastructure already on your system. For instance, Monit will easily integrate with init and can use existing runlevel rc-scripts to manage services. There are also flexibility for those special cases when you need a certain setup for a service.

Monit compiles and run on most flavors of UNIX. It is a small program and weights in at just over 300kB. There is support for compiling with glibc replacements such as uClibc if you need it to be even smaller.


Since you do not have root access, a script like this may work for your requirement of:

"If the job is not currently running, then start the job"

if [ $(ps ax | grep -v grep | grep "/usr/local/apache2/bin/httpd" | wc -l) -eq 0 ]
then
        echo "httpd Service not running"
        apachectl start
fi

the above is coded I created and tested with cron and the Apache httpd daemon. It simply searches for your string in the current list of processes. If 0 lines are found it isn't running so it will restart it. Make sure to include grep -v grep to eliminate your search from the process output. Try using the entire path to the binary to ensure it is the service being found in your queries. If you only use httpd for example, then having httpd.conf open in vim will make the program think the httpd service is running when it really isn't. Of course, your method of starting the service will also be different.

Solution 4

There are also solutions especially designed to work as a watchdog and even run as services scripts which don't create pid files etc. An example of such a solution is supervisor.

Share:
10,843

Related videos on Youtube

Ashish Jagtap
Author by

Ashish Jagtap

Updated on September 17, 2022

Comments

  • Ashish Jagtap
    Ashish Jagtap over 1 year

    I have a job that I would like to "daemonize" on Unix: I want it to come up when the computer boots, and I want it to restart if it goes down.

    A simple way to do this is to setup a cronjob that runs every 10 or 20 minutes. The cronjob should restart the application if it's not already running.

    How do I write this last part of the script: "If the job is not currently running, then start the job"?

    • DaveParillo
      DaveParillo over 14 years
      Only by using a program that is guaranteed to be running all the time (such as init or upstart) can you be sure that your program will (almost) always be alive. But I see from some of your comments below, you don't have root access. Just be aware that any periodic checking solution (pid file / cron) is only as good as the checking interval.
  • Ashish Jagtap
    Ashish Jagtap over 14 years
    I don't know if I can use monit, because I don't have root access on my system. So I cannot get the monit daemon to automatically load at boot.
  • John T
    John T over 14 years
    Ah I see, added some shell scripting which may help.
  • Ashish Jagtap
    Ashish Jagtap over 14 years
    djb is awesome, actually. If I'm not mistaken, he successfully sued the United States in favor of first amendment rights.
  • JdeBP
    JdeBP about 10 years
    runit was written by Gerrit Pape, not by Dan Bernstein.
  • jtimberman
    jtimberman about 10 years
    amended to clarify author of runit