High load average due to high system cpu load (%sys)
I would install atop from EPEL repository. Atop should help you show diagnose what is causing the %sys activity.
Atop also has a atop -r feature that will allow you to step through logs backward and fordward in time using t/T keys.
Also take a look at /proc/interrupts and through your /var/log/httpd/logs and sort those by ip to see if there is any suspect IP causing abnormal amounts of httpd traffic.
I would cron a cat /proc/interrupts to a log file. Look for high deltas in the interupts.
Related videos on Youtube
Comments
-
Nick almost 2 years
We have server with high traffic website. Recently we moved from
2 x 4 core server (8 cores in /proc/cpuinfo), 32 GB RAM, running CentOS 5.x, to
2 x 4 core server (16 cores in /proc/cpuinfo), 32 GB RAM, running CentOS 6.3
Server running nginx as a proxy, mysql server and sphinx-search.
Traffic is high, but mysql and sphinx-search databases are relatively small, and usually everything works blazing fast.
Today server experienced load average of 100++. Looking at top and sar, we noticed that (%sys) is very high - 50 to 70%. Disk utilization was less 1%. We tried to reboot, but problem existed after the reboot. At any moment server had at least 3-4 GB free RAM.
Only message shown by dmesg was "possible SYN flooding on port 80. Sending cookies.".
Here is snippet of sar
11:00:01 CPU %user %nice %system %iowait %steal %idle 11:10:01 all 21.60 0.00 66.38 0.03 0.00 11.99
We know that this is traffic issue, but we do not know how to proceed future and where to check for solution.
Is there a way we can find where exactly those "66.38%" are used.
Any suggestions would be appreciated.
update: Today load average is "normal" and "sys%" is OK too ~4%. However today's traffic is about 20-30% less than yesterday. This makes me think yesterdays problem is because of some kernel setting for TCP.
-
wazoox over 11 yearsWhat kind of network interfaces are you using? What does "ethtool -k <iface>" reports?
-
Nick over 11 yearsethtool -k em1 Offload parameters for em1: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off
-
wazoox over 11 yearsFrom the interface name I suppose this is a Dell system with e1000 or e1000e built-in interfaces?
-
Nick over 11 yearsnot sure, since I don't have physical access to the server, but is't a Dell for sure. This is what dmesg says Broadcom NetXtreme II BCM5716 1000Base-T (C0) PCI Express
-
wazoox over 11 yearsWell these normally work pretty well under linux... are you sure you're not under some DDoS attack?
-
Nick over 11 yearswe "are" constantly on "DDoS" attacks :) Our site is top 500 in alexa and traffic is huge, sometimes 200 MBit or more. what i am trying to understand why on old server there were no suck problem.
-
wazoox over 11 yearsYour current system apparently has hyperthreading enabled while the old one hadn't. That may be the culprit; HT performance can be tricky sometimes. I'd try turning HT off (in the BIOS) and see if it makes a significant difference.
-
Nick over 11 yearsbecause we do not have physical access, we will spoke with ISP and will try tomorrow morning.
-
Nick over 11 yearsfrom 2 days, we are in process of testing with hyper-threading off. until now everything works very well. we will know for sure in Saturday when big traffic will kick in. If you want make your comment as normal answer, so I can accept it in tomorrow evening. Thanks a lot.
-