Bash script that automatically kills processes when CPU/memory usage gets too high

41,073

Solution 1

I'm guessing the problem you want to solve is that you have some process running on your box which sometimes misbehaves, and sits forever pegging a core.

The first thing you want to do is to attempt to fix the program that goes crazy. That is by far the best solution. I'm going to assume that isn't possible, or you need a quick kluge to keep your box running until its fixed.

You, at minimum, want to limit your script to only hit the one program you're concerned about. It'd be best if permissions limited your script like this (e.g., your script runs as user X, the only other thing running as X is the program).

Even better would be to use something like ulimit -t to limit the amount of total CPU time that the program can use. Similarly, if it consumes all memory, check ulimit -v. The kernel enforces these limits; see the bash manpage (it's a shell built-in) and the setrlimit(2) manpage for details.

If the problem isn't a process running amok, but is instead just too many processes running, then implement some form of locking to prevent more than X from running (or—this should be getting familiar—ulimit -u). You may also consider changing the scheduler priority of those processes (using nice or renice), or for even more drastic, using sched_setscheduler to change the policy to SCHED_IDLE.

If you need even more control, take a look a control groups (cgroups). Depending on the kernel you're running, you can actually limit the amount of CPU time, memory, I/O, etc. that a whole group of processes together consume. Control groups are quite flexible; they can likely do whatever you're trying to do, without any fragile kluges. The Arch Linux Wiki has an intro to cgroups that's worth reading, as is Neil Brown's cgroups series at LWN.

Solution 2

Issues:

  • When sorting numeric fields you probably want to use the -n option: sort -nrk 2. Otherwise a line with a %CPU value of 5.0 will end up higher than one with a value of 12.0.
  • Depending on your ps implementation you might want to use the --no-headers option to get rid of the grep -v. That prevents you from discarding commands that contains PID.
  • I guess instead of echo CPU USAGE is at $CPU_LOAD, you meant echo CPU USAGE is at $CPU_USAGE.
  • I guess you forgot to remove the exit 0 that you inserted during debugging(?).

Style:

  • You might want to move the CPU_USAGE_THRESHOLD=800 line to the beginning of the file, as this the most informative thing and is most likely to get changed even after your script is stable.
  • You are repeating the -e option: ps -eo pid -eo pcpu -eo command is the same as ps -eo pid -o pcpu -o command (as is ps -eo pid,pcpu,command).
  • There is an empty else clause. That always looks as if it should be handled, but was not for some unknown reason.

Solution 3

Killing off processes which are using most CPU/memory is asking for trouble: Just look at what they are right now on your machine (here currently firefox, systemd (init), Xorg, gnome-terminal, a set of kernel threads, xemacs; none of which is dispensable). Look at how to tweak Linux' OOM-killer, for example here.

Also note that "memory used by the process" is a nebulous concept, as there are shared libraries, executables are shared, and even parts of data areas. One can come up with some number by charging each user with a fraction of the used space, but even adding that up really doesn't give "memory used" (even less "memory freed if the process goes away", the parts shared stay behind).

Solution 4

I have created a script, kill-process, that kills some processes listed in an array, if CPU usage is greater than XX% for YY seconds or kill processes that's running more than ZZ seconds.

  • You can set XX, YY, ZZ in the top of the file.
  • You can use a ps or top for check processes.
  • There's a dry run mode too, to check but not kill.
  • In the end, the script sends an email if some processes were killed.

NOTE: Here is my repo on Github: https://github.com/padosoft/kill-process

Here's a screenshot:

         ss#1

References

Essential part of script (a code abstract for top command):

#!/usr/bin/env bash

#max cpu % load
MAX_CPU=90
#max execution time for CPU percentage > MAX_CPU (in seconds 7200s=2h)
MAX_SEC=1800
#sort by cpu
SORTBY=9

#define a processes command name to check
declare -a KILLLIST
KILLLIST=("/usr/sbin/apache2" "/usr/bin/php5-cgi")

#iterate for each process to check in list
for PROCESS_TOCHECK in ${KILLLIST[*]}
do

    #retrive pid with top command order by SORTBY
    PID=$(top -bcSH -n 1 | grep $PROCESS_TOCHECK | sort -k $SORTBY -r | head -n 1 | awk '{print $1}')

    CPU=$(top -p $PID -bcSH -n 1 | grep $PROCESS_TOCHECK | sort -k $SORTBY -r | head -n 1 | awk '{print $9}')
    TIME_STR=$(top -p $PID -bcSH -n 1 | grep $PROCESS_TOCHECK | sort -k $SORTBY -r | head -n 1 | awk '{print $11}')

    # Decode the top CPU time format [dd-]hh:mm.ss.
    TIME_SEC=0
    IFS="-:" read c1 c2 c3 c4 <<< "$TIME_STR"

    #with top command time format is hh:mm.ss, so truncare seconds in c2
    c2=${c2%%.*}

    if [ -n "$c4" ]
    then
      TIME_SEC=$((10#$c4+60*(10#$c3+60*(10#$c2+24*10#$c1))))
    elif [ -n "$c3" ]
    then
      if [ "$CMD" = "ps" ]; then
        TIME_SEC=$((10#$c3+60*(10#$c2+60*10#$c1)))
      else
        TIME_SEC=$(((10#$c3*24)*60*60)+60*(10#$c2+60*10#$c1))             
      fi   
    else
      if [ "$CMD" = "ps" ]; then
        TIME_SEC=$((10#0+(10#$c2+60*10#$c1)))
      else
        TIME_SEC=$((10#0+60*(10#$c2+60*10#$c1)))
      fi
    fi

    #check if need to kill process
    if [ $CPU -gt $MAX_CPU ] && [ $TIME_SEC -gt $MAX_SEC ]; then
        kill -15 $PID
    fi

done
Usage:
bash killprocess.sh [dry|kill|--help] [top|ps] [cpu|time]
Share:
41,073

Related videos on Youtube

KK Patel
Author by

KK Patel

Cloud Infrastructure and DevOps Expert with around decade experience in IT Infrastructure. Expert in Designing and building Infrastructure as code, automation of cloud infrastructure provisioning , system provisioning , apps/Micro services deployments , CI/CD pipelines , building highly available and reliable IT Infrastructure platforms.

Updated on September 18, 2022

Comments

  • KK Patel
    KK Patel over 1 year

    I have created a script that kills processes if CPU and/or memory usage hits 80%. It creates a list of killed processes when this happens. What can I do to improve it?

    while [ 1 ];
    do 
    echo
    echo checking for run-away process ...
    
    CPU_USAGE=$(uptime | cut -d"," -f4 | cut -d":" -f2 | cut -d" " -f2 | sed -e "s/\.//g")
    CPU_USAGE_THRESHOLD=800
    PROCESS=$(ps aux r)
    TOPPROCESS=$(ps -eo pid -eo pcpu -eo command | sort -k 2 -r | grep -v PID | head -n 1)
    
    if [ $CPU_USAGE -gt $CPU_USAGE_THRESHOLD] ; then
      kill -9 $(ps -eo pid | sort -k 1 -r | grep -v PID | head -n 1) #original
      kill -9 $(ps -eo pcpu | sort -k 1 -r | grep -v %CPU | head -n 1)
      kill -9 $TOPPROCESS
      echo system overloading!
      echo Top-most process killed $TOPPROCESS
          echo CPU USAGE is at $CPU_LOAD
    
    else
        fi
        exit 0
        sleep 1;
        done
    
    • rahmu
      rahmu about 11 years
      Have you tried running the script? while [ 1 ] makes me wonder how much CPU this script alone is going to consume. Also, 3 calls to kill -9 in a script running constantly? This gives me the chills ...
    • daisy
      daisy about 11 years
      Nice avatar anyway, @rahmu he got a sleep 1 in the loop
    • jippie
      jippie about 11 years
      First monday of the month and my PC is checking a (slow'ish) RAID6 set. CPU load easily peaks above 8 because it is constantly waiting for disk IO from this RAID set. Nothing wrong, system is still very responsive. Your script would kill my firefox which uses only 3.6% of the available 400%. Just saying you may be hunting for ghosts with this script. BTW: your system will not get damaged because of high load and when memory runs out, the kernel will do a semi-educated guess on which process(es) to kill.
    • KK Patel
      KK Patel about 11 years
      Then kill process on based load will be fine or not ????
    • jippie
      jippie about 11 years
      In my use case the killing is not desired.
    • jippie
      jippie about 11 years
      are you familiar with the commands cpulimit and timeout?
    • Stephane
      Stephane almost 7 years
      You lack a white space between CPU_USAGE_THRESHOLD and ] in $CPU_USAGE_THRESHOLD] as in $CPU_USAGE_THRESHOLD ]
    • Stephane
      Stephane almost 7 years
      So as to test this above script, how to create a running away process ?
    • Stephane
      Stephane almost 7 years
      The PROCESS variable is defined but never used.
  • lk_vc
    lk_vc about 7 years
    Seems sort should be sort -k9nr. Without n, will get ` 5.9` > 29.4.