Kill processes if high load average

6,748

Solution 1

You can use a watchdog like Monit to watch over the processes you care about, and restart them if they consume excess resources.

Something like this would be used to monitor Apache:

 check process apache with pidfile /var/run/httpd.pid
       start program = "/etc/init.d/httpd start"
       stop program  = "/etc/init.d/httpd stop"
       if cpu > 40% for 2 cycles then alert
       if totalcpu > 60% for 2 cycles then alert
       if totalcpu > 80% for 5 cycles then restart
       if mem > 100 MB for 5 cycles then stop
       if loadavg(5min) greater than 10.0 for 8 cycles then stop

So, if the cpu% for the Apache process or any of its children are over 40%, send an alert. If it's above 80%, do a restart of Apache.

Monit will also start up Apache if it's not running for some reason, which is a reasonable way to keep critical services up (if you don't have something like Upstart available).

This assumes that you have a set of processes that you can target for this sort of monitoring. Presumably, you suspect a particular application may be a problem.

Solution 2

When your LA raises and you can't login via ssh, try Grey Goo a tiny available and reliable remote command execution server and client designed purely for emergency situations:

https://code.google.com/p/greygoo/

Share:
6,748

Related videos on Youtube

Drakmail
Author by

Drakmail

Updated on September 18, 2022

Comments

  • Drakmail
    Drakmail over 1 year

    Not so far ago LA on my server raised to 400 and I couldn't even login to server using ssh. Does exists any software, that can prevent this situations by automatically killing processes that making huge load on server?

    PS. Debian 6.0.5

  • Drakmail
    Drakmail almost 12 years
    Ideally, I want something, that will be kill all suspicious processes which using too many CPU or IO with something like blacklist of processes, that never will be killed (like ssh).
  • Matthew Ife
    Matthew Ife almost 12 years
    I dont want to sound harsh, but that feels like a band-aid to me. You should really be figuring out what circumstances cause the load to go out of control and remedy the problem at the source.
  • Drakmail
    Drakmail almost 12 years
    Sounds good, but problem does not repeat now :( I has some suggestions, why it could be, but I'm afraid that verifying of it my kill my server again.