Bash script that automatically kills processes when CPU/memory usage gets too high
Solution 1
I'm guessing the problem you want to solve is that you have some process running on your box which sometimes misbehaves, and sits forever pegging a core.
The first thing you want to do is to attempt to fix the program that goes crazy. That is by far the best solution. I'm going to assume that isn't possible, or you need a quick kluge to keep your box running until its fixed.
You, at minimum, want to limit your script to only hit the one program you're concerned about. It'd be best if permissions limited your script like this (e.g., your script runs as user X, the only other thing running as X is the program).
Even better would be to use something like ulimit -t
to limit the amount of total CPU time that the program can use. Similarly, if it consumes all memory, check ulimit -v
. The kernel enforces these limits; see the bash
manpage (it's a shell built-in) and the setrlimit(2)
manpage for details.
If the problem isn't a process running amok, but is instead just too many processes running, then implement some form of locking to prevent more than X from running (or—this should be getting familiar—ulimit -u
). You may also consider changing the scheduler priority of those processes (using nice
or renice
), or for even more drastic, using sched_setscheduler
to change the policy to SCHED_IDLE
.
If you need even more control, take a look a control groups (cgroups). Depending on the kernel you're running, you can actually limit the amount of CPU time, memory, I/O, etc. that a whole group of processes together consume. Control groups are quite flexible; they can likely do whatever you're trying to do, without any fragile kluges. The Arch Linux Wiki has an intro to cgroups that's worth reading, as is Neil Brown's cgroups series at LWN.
Solution 2
Issues:
- When sorting numeric fields you probably want to use the
-n
option:sort -nrk 2
. Otherwise a line with a%CPU
value of 5.0 will end up higher than one with a value of 12.0. - Depending on your
ps
implementation you might want to use the--no-headers
option to get rid of thegrep -v
. That prevents you from discarding commands that containsPID
. - I guess instead of
echo CPU USAGE is at $CPU_LOAD
, you meantecho CPU USAGE is at $CPU_USAGE
. - I guess you forgot to remove the
exit 0
that you inserted during debugging(?).
Style:
- You might want to move the
CPU_USAGE_THRESHOLD=800
line to the beginning of the file, as this the most informative thing and is most likely to get changed even after your script is stable. - You are repeating the
-e
option:ps -eo pid -eo pcpu -eo command
is the same asps -eo pid -o pcpu -o command
(as isps -eo pid,pcpu,command
). - There is an empty
else
clause. That always looks as if it should be handled, but was not for some unknown reason.
Solution 3
Killing off processes which are using most CPU/memory is asking for trouble: Just look at what they are right now on your machine (here currently firefox, systemd (init), Xorg, gnome-terminal, a set of kernel threads, xemacs; none of which is dispensable). Look at how to tweak Linux' OOM-killer, for example here.
Also note that "memory used by the process" is a nebulous concept, as there are shared libraries, executables are shared, and even parts of data areas. One can come up with some number by charging each user with a fraction of the used space, but even adding that up really doesn't give "memory used" (even less "memory freed if the process goes away", the parts shared stay behind).
Solution 4
I have created a script, kill-process, that kills some processes listed in an array, if CPU usage is greater than XX% for YY seconds or kill processes that's running more than ZZ seconds.
- You can set XX, YY, ZZ in the top of the file.
- You can use a ps or top for check processes.
- There's a dry run mode too, to check but not kill.
- In the end, the script sends an email if some processes were killed.
NOTE: Here is my repo on Github: https://github.com/padosoft/kill-process
Here's a screenshot:
References
Essential part of script (a code abstract for top command):
#!/usr/bin/env bash
#max cpu % load
MAX_CPU=90
#max execution time for CPU percentage > MAX_CPU (in seconds 7200s=2h)
MAX_SEC=1800
#sort by cpu
SORTBY=9
#define a processes command name to check
declare -a KILLLIST
KILLLIST=("/usr/sbin/apache2" "/usr/bin/php5-cgi")
#iterate for each process to check in list
for PROCESS_TOCHECK in ${KILLLIST[*]}
do
#retrive pid with top command order by SORTBY
PID=$(top -bcSH -n 1 | grep $PROCESS_TOCHECK | sort -k $SORTBY -r | head -n 1 | awk '{print $1}')
CPU=$(top -p $PID -bcSH -n 1 | grep $PROCESS_TOCHECK | sort -k $SORTBY -r | head -n 1 | awk '{print $9}')
TIME_STR=$(top -p $PID -bcSH -n 1 | grep $PROCESS_TOCHECK | sort -k $SORTBY -r | head -n 1 | awk '{print $11}')
# Decode the top CPU time format [dd-]hh:mm.ss.
TIME_SEC=0
IFS="-:" read c1 c2 c3 c4 <<< "$TIME_STR"
#with top command time format is hh:mm.ss, so truncare seconds in c2
c2=${c2%%.*}
if [ -n "$c4" ]
then
TIME_SEC=$((10#$c4+60*(10#$c3+60*(10#$c2+24*10#$c1))))
elif [ -n "$c3" ]
then
if [ "$CMD" = "ps" ]; then
TIME_SEC=$((10#$c3+60*(10#$c2+60*10#$c1)))
else
TIME_SEC=$(((10#$c3*24)*60*60)+60*(10#$c2+60*10#$c1))
fi
else
if [ "$CMD" = "ps" ]; then
TIME_SEC=$((10#0+(10#$c2+60*10#$c1)))
else
TIME_SEC=$((10#0+60*(10#$c2+60*10#$c1)))
fi
fi
#check if need to kill process
if [ $CPU -gt $MAX_CPU ] && [ $TIME_SEC -gt $MAX_SEC ]; then
kill -15 $PID
fi
done
Usage:
bash killprocess.sh [dry|kill|--help] [top|ps] [cpu|time]
Related videos on Youtube
KK Patel
Cloud Infrastructure and DevOps Expert with around decade experience in IT Infrastructure. Expert in Designing and building Infrastructure as code, automation of cloud infrastructure provisioning , system provisioning , apps/Micro services deployments , CI/CD pipelines , building highly available and reliable IT Infrastructure platforms.
Updated on September 18, 2022Comments
-
KK Patel over 1 year
I have created a script that kills processes if CPU and/or memory usage hits 80%. It creates a list of killed processes when this happens. What can I do to improve it?
while [ 1 ]; do echo echo checking for run-away process ... CPU_USAGE=$(uptime | cut -d"," -f4 | cut -d":" -f2 | cut -d" " -f2 | sed -e "s/\.//g") CPU_USAGE_THRESHOLD=800 PROCESS=$(ps aux r) TOPPROCESS=$(ps -eo pid -eo pcpu -eo command | sort -k 2 -r | grep -v PID | head -n 1) if [ $CPU_USAGE -gt $CPU_USAGE_THRESHOLD] ; then kill -9 $(ps -eo pid | sort -k 1 -r | grep -v PID | head -n 1) #original kill -9 $(ps -eo pcpu | sort -k 1 -r | grep -v %CPU | head -n 1) kill -9 $TOPPROCESS echo system overloading! echo Top-most process killed $TOPPROCESS echo CPU USAGE is at $CPU_LOAD else fi exit 0 sleep 1; done
-
rahmu about 11 yearsHave you tried running the script?
while [ 1 ]
makes me wonder how much CPU this script alone is going to consume. Also, 3 calls tokill -9
in a script running constantly? This gives me the chills ... -
daisy about 11 yearsNice avatar anyway, @rahmu he got a
sleep 1
in the loop -
jippie about 11 yearsFirst monday of the month and my PC is checking a (slow'ish) RAID6 set. CPU load easily peaks above 8 because it is constantly waiting for disk IO from this RAID set. Nothing wrong, system is still very responsive. Your script would kill my firefox which uses only 3.6% of the available 400%. Just saying you may be hunting for ghosts with this script. BTW: your system will not get damaged because of high load and when memory runs out, the kernel will do a semi-educated guess on which process(es) to kill.
-
KK Patel about 11 yearsThen kill process on based load will be fine or not ????
-
jippie about 11 yearsIn my use case the killing is not desired.
-
jippie about 11 yearsare you familiar with the commands
cpulimit
andtimeout
? -
Stephane almost 7 yearsYou lack a white space between
CPU_USAGE_THRESHOLD
and]
in$CPU_USAGE_THRESHOLD]
as in$CPU_USAGE_THRESHOLD ]
-
Stephane almost 7 yearsSo as to test this above script, how to create a running away process ?
-
Stephane almost 7 yearsThe
PROCESS
variable is defined but never used.
-
-
lk_vc about 7 yearsSeems
sort
should besort -k9nr
. Withoutn
, will get ` 5.9` >29.4
.