Massive CPU usage spike, How do I find out what causes it?
Solution 1
How often did you run that logging cronjob? Maybe you should run it more often, because CPU usage doesn't peak instantly, you have to see an increase somewhere. Alternatively, you could use atop to monitor resource load (including CPU load) overtime.
Solution 2
Not strictly speaking an answer to your question, but check out monit. You can configure it to monitor all kinds of stuff, including global system stats. For example, if cpu usage is over 97% for 3 minutes, my servers will reboot. If apache uses >80% cpu for 5 minutes, it gets restarted, and so on. It's an incredibly useful piece of software and has me sleeping much, much easier at nights. :-)
Related videos on Youtube
Admin
Updated on September 17, 2022Comments
-
Admin over 1 year
I have a server running CentOS Linux, and very rarely (maybe once every 3 months) something happens that causes it to have an exceptionally high CPU load (400%) that causes the server to basically freeze up.
The problem I have is that when I reboot the server, I can't figure out what caused the spike. I tried setting up a cron job to occasionally dump to a log file the top 10 CPU processes, but when the CPU load is high the cron job apparently won't run either.
I'm sort of new to running a server, so I'm hoping you guys might have some advice on how I could better log the processes and figure out what's causing the sudden spike the next time it happens. I'm sure it's just a script or process that goes out of control, but until I can figure out which one it is I'm sort of at a loss...
Thanks for any help you can provide!
-
Seth over 13 yearsPerhaps a proactive reboot once every 2.9 months is in order :)
-
Admin over 13 yearslol, it appears the server is forcing that on me whether I want it or not :)
-
-
Admin over 13 yearsThe cronjob was originally running every 10 minutes, but I'll change it to every minute from now on so I can hopefully catch it.
-
SynackSA over 13 yearsI don't see any log entries for the period it was frozen, and it's not actually frozen just bogged down. I know it's a CPU issue cause the server provider (Linode) has a graph showing resource usage and the CPU is the one that spikes through the roof. If only they provided a list of processes percentage use :) I also can't login at all, but it's cause it times out not because SSH is down...
-
halp over 13 yearsCPU load average displayed by
top
command shows 3 values: during the last minute, during the last 5 minutes and during the last 15 minutes. So 10 minutes was certainly not the lowest granularity level you could use when attempting to spot the CPU hog. -
chmac over 9 yearsAny insight into why the downvote? Is it bad form to post useful info that's not strictly an answer to the question? Was there an issue with the accuracy or validity of what I posted?
-
Eric about 3 yearsProbably because it has nothing to do with diagnosing load spikes and recommends rebooting the machine to solve load problems. In other words, "[n]ot...an answer to your question."