High load average due to high system cpu load (%sys)

6,055

I would install atop from EPEL repository. Atop should help you show diagnose what is causing the %sys activity.

Atop also has a atop -r feature that will allow you to step through logs backward and fordward in time using t/T keys.

Also take a look at /proc/interrupts and through your /var/log/httpd/logs and sort those by ip to see if there is any suspect IP causing abnormal amounts of httpd traffic.

I would cron a cat /proc/interrupts to a log file. Look for high deltas in the interupts.

Share:
6,055

Related videos on Youtube

Nick
Author by

Nick

PHP, MySQL, NoSQL, Redis, Linux, Apache

Updated on September 18, 2022

Comments

  • Nick
    Nick almost 2 years

    We have server with high traffic website. Recently we moved from

    2 x 4 core server (8 cores in /proc/cpuinfo), 32 GB RAM, running CentOS 5.x, to

    2 x 4 core server (16 cores in /proc/cpuinfo), 32 GB RAM, running CentOS 6.3

    Server running nginx as a proxy, mysql server and sphinx-search.

    Traffic is high, but mysql and sphinx-search databases are relatively small, and usually everything works blazing fast.

    Today server experienced load average of 100++. Looking at top and sar, we noticed that (%sys) is very high - 50 to 70%. Disk utilization was less 1%. We tried to reboot, but problem existed after the reboot. At any moment server had at least 3-4 GB free RAM.

    Only message shown by dmesg was "possible SYN flooding on port 80. Sending cookies.".

    Here is snippet of sar

    11:00:01        CPU     %user     %nice   %system   %iowait    %steal     %idle
    11:10:01        all     21.60      0.00     66.38      0.03      0.00     11.99
    

    We know that this is traffic issue, but we do not know how to proceed future and where to check for solution.

    Is there a way we can find where exactly those "66.38%" are used.

    Any suggestions would be appreciated.


    update: Today load average is "normal" and "sys%" is OK too ~4%. However today's traffic is about 20-30% less than yesterday. This makes me think yesterdays problem is because of some kernel setting for TCP.

    • wazoox
      wazoox over 11 years
      What kind of network interfaces are you using? What does "ethtool -k <iface>" reports?
    • Nick
      Nick over 11 years
      ethtool -k em1 Offload parameters for em1: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off
    • wazoox
      wazoox over 11 years
      From the interface name I suppose this is a Dell system with e1000 or e1000e built-in interfaces?
    • Nick
      Nick over 11 years
      not sure, since I don't have physical access to the server, but is't a Dell for sure. This is what dmesg says Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express
    • wazoox
      wazoox over 11 years
      Well these normally work pretty well under linux... are you sure you're not under some DDoS attack?
    • Nick
      Nick over 11 years
      we "are" constantly on "DDoS" attacks :) Our site is top 500 in alexa and traffic is huge, sometimes 200 MBit or more. what i am trying to understand why on old server there were no suck problem.
    • wazoox
      wazoox over 11 years
      Your current system apparently has hyperthreading enabled while the old one hadn't. That may be the culprit; HT performance can be tricky sometimes. I'd try turning HT off (in the BIOS) and see if it makes a significant difference.
    • Nick
      Nick over 11 years
      because we do not have physical access, we will spoke with ISP and will try tomorrow morning.
    • Nick
      Nick over 11 years
      from 2 days, we are in process of testing with hyper-threading off. until now everything works very well. we will know for sure in Saturday when big traffic will kick in. If you want make your comment as normal answer, so I can accept it in tomorrow evening. Thanks a lot.