Can Monit be configured to never unmonitor/timeout a service?

6,780

Solution 1

I would simply use a cron job that would run monit start servicename at the desired intervals. Of course, you can use groups for a finer control.

Solution 2

After doing some digging, it turns out Monit stores system monitoring data in a “state” file. And this “state” file keeps track of what services are being monitored/unmonitored.

So while this is a bit “brute force”-ish, it definitely works. If a service becomes “unmonitored” due to something like a timeout, then just remove the Monit state file from the system like this:

sudo rm /var/lib/monit/state

And then restart Monit like this and all should be good:

sudo service monit restart

Solution 3

I had the exact same issue where despite restarting monit, it would refuse to monitor after the timeout. Finally figured out had to delete the monit state file (/var/.monit.state) and restart monit to make it monitor all programs again.

Solution 4

Based on your Monit code snippet, it looks like you have to modify or add cycle statements to your process stanza. See the relevant documentation here and here.

It seems like you may want to set your service tests to execute every cycle with no timeout statement. Also look at your monit homepage at http://hostname:2812. Check the page for the relevant service and look at the "Existence" field. Your default should look like:

If doesn't exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
Share:
6,780
Joe Shaw
Author by

Joe Shaw

Open source programmer, mainly in C, Python, JavaScript, and C#.

Updated on September 18, 2022

Comments

  • Joe Shaw
    Joe Shaw over 1 year

    Monit seems to give up restarting a service if it fails a few times, and unmonitors it. I can’t find anything in the documentation about the specifics of when or why.

    My Monit config would be setup as follows:

    set daemon 10
    set logfile /var/log/monit.log
    set statefile /var/lib/monit/monit.state
    set alert [email protected] not { nonexist, action, instance }
    include /etc/monit/conf.d/*
    

    And this is an example of the Monit ruleset I am using:

    check process myservice
      with pidfile /var/run/myservice/myservice.pid
      start program = "/home/myservice/current/start-myservice.sh"
        as uid myservice and gid myservice
      stop program = "/home/myservice/current/stop-myservice.sh"
        as uid myservice and gid myservice
      mode active
    

    In my environment, I want it to keep trying on its poll intervals indefinitely. Is there any way to configure monit to never stop monitoring a service, even if it doesn’t start up successfully?

    • ewwhite
      ewwhite over 12 years
      Please post a sample of your monit config.
    • Joe Shaw
      Joe Shaw over 12 years
      gist.github.com/1229828 -- I removed some mail server/alert stuff and the HTTP server configuration from monitrc. The other file is an example of our service configuration. Note the lack of "if x restarts then timeout" clause in it.
    • Ramon Tayag
      Ramon Tayag over 12 years
      I've wondered about this myself. Sometimes I just kill something to test what monit does and monit just unmonitors it.
  • Joe Shaw
    Joe Shaw about 12 years
    This is exactly what I ended up doing, but I'm not too happy about it.
  • Admin
    Admin almost 11 years
    just take out the if 5 restarts within 5 cycles then timeout then monit system will not unmonitor it.