How to watch a service with multiple processes with Monit?

6,737

There are two possible solutions I've found. Both are not optimal but work in my scenario:

  1. For every subservice, only check if the PID file exists and assume the service is online if the file exists. As before, the main service sympa depends on the subservices:

    check process sympa
      with pidfile /var/run/sympa/sympa.pid
      start program = "/etc/init.d/sympa start"
      stop program = "/etc/init.d/sympa stop"
      depends on sympa_bounced, sympa_bulk
    
    check file sympa_bounced
      with path /var/run/sympa/bounced.pid
      if does not exist then restart
    
    check file sympa_bulk
      with path /var/run/sympa/bulk.pid
      if does not exist then restart
    

    restart does nothing for files, but because sympa depends on the subservices, it will be restarted.

  2. With newer monit versions, you can also execute a command and pass arguments to the command:

    check process sympa
      with pidfile /var/run/sympa/sympa.pid
      start program = "/etc/init.d/sympa start"
      stop program = "/etc/init.d/sympa stop"
      depends on sympa_bounced, sympa_bulk
    
    check program sympa_bounced
      with path "/usr/bin/pgrep --pidfile /var/run/sympa/bounced.pid"
      if does not exist then restart
    
    check program sympa_bulk
      with path "/usr/bin/pgrep --pidfile /var/run/sympa/bulk.pid"
      if does not exist then restart
    

    Similar to check file, the restart action does not do anything for programs, but forces the service sympa to restart.

    With older monit versions (e.g. 5.4, the current version in Debian Wheezy), you cannot pass arguments to a command, so you could write a simple (one-line) script for each service which executes /usr/bin/pgrep with the respective arguments.

With both solutions, sympa is restarted once if any of the subservices fail or if sympa is not running at all.

Share:
6,737

Related videos on Youtube

morxa
Author by

morxa

Updated on September 18, 2022

Comments

  • morxa
    morxa over 1 year

    I'm trying to watch the mailing list manager sympa with monit. A running sympa instance consists of multiple processes for the different tasks of list management (e.g. a separate process for archiving emails), but all processes are started/stopped with a single init script.

    Optimally, monit should alert me if any of the services fails and then restart sympa, but restart it only once. A first solution looked like this:

    check process sympa
      with pidfile /var/run/sympa/sympa.pid
      start program = "/etc/init.d/sympa start"
      stop program = "/etc/init.d/sympa stop"
    check process sympa_bounced
      with pidfile /var/run/sympa/bounced.pid
      start program = "/etc/init.d/sympa start"
      stop program = "/etc/init.d/sympa stop"
    check process sympa_bulk
      with pidfile /var/run/sympa/bulk.pid
      start program = "/etc/init.d/sympa start"
      stop program = "/etc/init.d/sympa stop"
    

    However, if I stop sympa manually, the init script will be executed multiple times, once for every service I define (because every service failed).

    My second approach was to define dependencies and only alert if any of the subservices fail:

    check process sympa
      with pidfile /var/run/sympa/sympa.pid
      start program = "/etc/init.d/sympa start"
      stop program = "/etc/init.d/sympa stop"
      depends on sympa_bounced, sympa_bulk
    check process sympa_bounced
      with pidfile /var/run/sympa/bounced.pid
      if does not exist then alert
    check process sympa_bulk
      with pidfile /var/run/sympa/bulk.pid
      if does not exist then alert
    

    But since the subservices are not restarted, the main service will also not be restarted. So I figured I could 'fake' a restart by setting start/stop to /bin/true:

    check process sympa
      with pidfile /var/run/sympa/sympa.pid
      start program = "/etc/init.d/sympa start"
      stop program = "/etc/init.d/sympa stop"
      depends on sympa_bounced, sympa_bulk
    check process sympa_bounced
      with pidfile /var/run/sympa/bounced.pid
      start program = "/bin/true"
      stop program = "/bin/true"
    check process sympa_bulk
      with pidfile /var/run/sympa/bulk.pid
      start program = "/bin/true"
      stop program = "/bin/true"
    

    This does not work either, because if sympa_bulk fails, the PID file will not created before the sympa service is restarted, and this does not happen before the sympa_bulk is running again.

    Is there a way to monitor such a service, get alert messages for all subservices, but restart the service only once, even if all subservices fail at once?

  • morxa
    morxa over 9 years
    This does not restart sympa if sympa_bulk fails, only vice versa.
  • morxa
    morxa over 9 years
    With sympa, this actually does not work, because the init script of sympa is somewhat broken (at least in Debian). If the main process sympa is killed but other processes (such as sympa_bounced) are still running, then /etc/init.d/sympa start does not start the main process. Therefore I would not recommend to use this with sympa.