Massive CPU usage spike, How do I find out what causes it?

12,474

Solution 1

How often did you run that logging cronjob? Maybe you should run it more often, because CPU usage doesn't peak instantly, you have to see an increase somewhere. Alternatively, you could use atop to monitor resource load (including CPU load) overtime.

Solution 2

Not strictly speaking an answer to your question, but check out monit. You can configure it to monitor all kinds of stuff, including global system stats. For example, if cpu usage is over 97% for 3 minutes, my servers will reboot. If apache uses >80% cpu for 5 minutes, it gets restarted, and so on. It's an incredibly useful piece of software and has me sleeping much, much easier at nights. :-)

Share:
12,474

Related videos on Youtube

Admin
Author by

Admin

Updated on September 17, 2022

Comments

  • Admin
    Admin over 1 year

    I have a server running CentOS Linux, and very rarely (maybe once every 3 months) something happens that causes it to have an exceptionally high CPU load (400%) that causes the server to basically freeze up.

    The problem I have is that when I reboot the server, I can't figure out what caused the spike. I tried setting up a cron job to occasionally dump to a log file the top 10 CPU processes, but when the CPU load is high the cron job apparently won't run either.

    I'm sort of new to running a server, so I'm hoping you guys might have some advice on how I could better log the processes and figure out what's causing the sudden spike the next time it happens. I'm sure it's just a script or process that goes out of control, but until I can figure out which one it is I'm sort of at a loss...

    Thanks for any help you can provide!

    • Seth
      Seth over 13 years
      Perhaps a proactive reboot once every 2.9 months is in order :)
    • Admin
      Admin over 13 years
      lol, it appears the server is forcing that on me whether I want it or not :)
  • Admin
    Admin over 13 years
    The cronjob was originally running every 10 minutes, but I'll change it to every minute from now on so I can hopefully catch it.
  • SynackSA
    SynackSA over 13 years
    I don't see any log entries for the period it was frozen, and it's not actually frozen just bogged down. I know it's a CPU issue cause the server provider (Linode) has a graph showing resource usage and the CPU is the one that spikes through the roof. If only they provided a list of processes percentage use :) I also can't login at all, but it's cause it times out not because SSH is down...
  • halp
    halp over 13 years
    CPU load average displayed by top command shows 3 values: during the last minute, during the last 5 minutes and during the last 15 minutes. So 10 minutes was certainly not the lowest granularity level you could use when attempting to spot the CPU hog.
  • chmac
    chmac over 9 years
    Any insight into why the downvote? Is it bad form to post useful info that's not strictly an answer to the question? Was there an issue with the accuracy or validity of what I posted?
  • Eric
    Eric about 3 years
    Probably because it has nothing to do with diagnosing load spikes and recommends rebooting the machine to solve load problems. In other words, "[n]ot...an answer to your question."