How can I diagnose and fix frequent 100% cpu utilization from kernel?

5,386

What is the percentage of "iowait" and "steal" CPU time during these periods?

Iowait denotes the amount of time the CPU is spending waiting for IO requests to complete, and steal percentage denotes CPU time that your kernel requested, but was denied by the hypervisor.

EC2 t1.micro instances are very CPU and IO-constrained. They can burst for very short amounts of time, after which they're subject to severe CPU throttling. Next time this happens, pay attention to %wa and %st in the output of top. My bet is that one or both of these have high percentages of CPU time.

To mitigate, you'll need to find the source of the IO and/or CPU load or alternatively, resize your instance to an m1.small.

Share:
5,386

Related videos on Youtube

notlesh
Author by

notlesh

Full-stack software engineer with 10+ years of programming experience, offering a diverse range of engineering talent. Extensive experience designing C++ products on embedded Linux systems. Comfortable with entire product lifecycle from design to support.

Updated on September 18, 2022

Comments

  • notlesh
    notlesh over 1 year

    I have an Amazon EC2 micro instance running an old 2.6.16 kernel. It runs postfix, apache, and mysql. During normal loads, it's load average is around 0.05, and it runs this way 95% of the time or so. However, a few times a day (or so), the CPU usage will spike to 100% and the system becomes nearly unusable. This usually lasts for roughly 5 minutes, then the load returns to normal.

    If I manage to take a look at htop while this happens (not easy -- the load is that severe), I see that no running task accounts for any significant cpu usage, leading me to believe this must all be taking place in kernel-land.

    How can I diagnose the cause of this load and, more importantly, fix it?

  • notlesh
    notlesh about 11 years
    Conveniently, this just happened again. %st is pegged at 98% or so.
  • EEAA
    EEAA about 11 years
    @stephelton - yep, that's Xen's CPU throttling kicking in. That's one reason why t1.micro instance are no good for anything but the most light workloads, or applications where performance is a very low priority.
  • notlesh
    notlesh about 11 years
  • notlesh
    notlesh about 11 years
    I suggest you edit your answer to include brief explanation of what %st means in the context of EC2/virtualization. Anyway, thanks!