High Load Average with modest CPU Utilization and almost no IO
Solution 1
Load average is based on the processes waiting in the run queue. That means if you have processes that use fractional time slices often you can see a high load average without a high CPU utilization.
The best example of this is mail. The amount of CPU time require to send a message is very limited, but when thousands of pieces of mail are moving around the system (especially if the mail daemon forks processes to handle each one) the run queue gets very long. It is common to see well functioning, responsive mail servers with load averages of 25, 50 to over 100.
For a web server I would use page response time as the primary metric, do not worry about load average. Under modern schedulers load average less than twice the number of cores will usually have no negative effects. You may want to experiment with number of cores per VM versus total number of VMs. Some applications will benefit from many cores on a few machines, others are better at a small number of cores and many instances.
Solution 2
I've been dealing with a scenario very similar to yours. In my case, the load average dropped after changing the IO scheduler of the problematic VM's block device to the NOOP scheduler. This scheduler is just a FIFO queue, which works well when the hypervisor will apply its own IO scheduling algorithms anyway. No need to reorder twice.
With that said, I'm still dealing with sluggish keyboard events on the problematic VM, so I think I've only removed the high load average without resolving the actual problem. I'll definitely update this answer if I find the root cause.
List available schedulers (and [scheduler] in use):
cat /sys/block/sdX/queue/scheduler
noop anticipatory deadline [cfq]
Change it with:
echo noop > /sys/block/sdX/queue/scheduler
To make it persistent, you need to add elevator=noop
to your VM's kernel boot parameters.
Solution 3
When you have a performance problem in a VM first you need to approach the problem from both supervisor side and from VM. Another thing to keep in mind is that timekeeping in a VM is not precise. This also means that stats measured in a VM might not be correct.
What are the CPU and I/O stats for this VM? Pay attention to CPU ready counter - it should be under 5%. Which version of ESX are you running on? What is your hardware architecture in test and prod?
On VM you can profile everything from application to kernel with perf and visualize the output with flamegraphs
Solution 4
If we use following shell commands to monitor the real load average, we might have different views on this phenomenon. procs_running could be much higher than we expected.
while true; do cat /proc/loadavg ; cat /proc/stat| grep procs; done
Related videos on Youtube
dbenhur
Updated on September 18, 2022Comments
-
dbenhur almost 2 years
The usual explanation for high load average with little cpu use under linux is too much IO (or more properly uninterruptable sleep).
I have a service running on a cluster of 2-core VMs that exhibit modest CPU use (~55-70% idle) but higher than 2 load averages while experiencing near zero IO, modest context switches, and no swapping. Polling with
ps
I never seeD
in the process state column.The service is ruby 1.9 running under unicorn. It connects to two upstream postgres databases which are delivering very fast avg statement executions (~0.5ms). The service is logging elapsed request durations about twice as high in production as it demonstrated under higher stress load on our performance testing network. The only monitoring signal which seems out of whack is load average (and of course avg response duration), everything else (cpu, memory, io, network, cswitch, intr) is nominal and matching projections.
System is Ubuntu 10.04.4 LTS "Lucid". uname is
Linux dirsvc0 2.6.32-32-server #62-Ubuntu SMP Wed Apr 20 22:07:43 UTC 2011 x86_64 GNU/Linux
. Hypervisor is VMWare ESX 5.1.UPDATE: More info as requested by @ewwhite. The storage is a virtual disk device mapping to a an NFS mount on the vm host attached to a NetApp. I'll point out that all indications are that there's no significant disk IO happening. The service reads and writes to network sockets (~200KB/s) and does ordinary access and error logging (at a rate of about 20KB/s). The vm host has a pair of gigabit ports going to two top of rack switches each of which have bonded four gigabit ports back to a core router, all copper. Each vm host has 24 (4x6) physical cores and 150GB of memory and typically hosts about 30 similarly sized vm guests running a variety of different services. In production these hosts are never over-committed on memory and only modestly over-committed on cpu.
I'd welcome ideas to explain the high load.
Here's some extract sar data from a two hour window midday today:
sar -q # load average
runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 12:05:01 PM 1 173 1.15 2.41 2.48 12:15:01 PM 0 173 0.96 1.56 1.99 12:25:01 PM 2 173 2.60 2.49 2.21 12:35:01 PM 1 173 1.44 2.10 2.06 12:45:01 PM 0 173 3.66 3.31 2.56 12:55:01 PM 0 173 3.05 2.66 2.43 01:05:01 PM 0 174 1.37 2.35 2.36 01:15:01 PM 0 173 3.06 3.07 2.60 01:25:01 PM 2 173 5.03 6.50 4.50 01:35:01 PM 0 173 4.26 5.61 4.98 01:45:01 PM 8 173 4.61 4.46 4.48 01:55:01 PM 0 173 3.30 3.60 3.92 02:05:01 PM 1 173 2.51 2.62 3.15
sar # cpu
CPU %user %nice %system %iowait %steal %idle 12:05:01 PM all 31.31 0.60 2.18 0.02 0.00 65.89 12:15:01 PM all 27.51 0.60 2.07 0.02 0.00 69.79 12:25:01 PM all 28.09 0.61 1.90 0.03 0.00 69.36 12:35:01 PM all 32.04 0.67 2.26 0.02 0.00 65.02 12:45:01 PM all 33.44 0.69 2.61 0.02 0.00 63.24 12:55:01 PM all 30.62 0.63 2.14 0.02 0.00 66.59 01:05:01 PM all 29.42 0.61 2.07 0.03 0.00 67.87 01:15:01 PM all 31.93 0.62 2.39 0.02 0.00 65.05 01:25:01 PM all 41.60 0.82 3.65 0.03 0.00 53.90 01:35:01 PM all 43.14 0.88 3.68 0.03 0.00 52.28 01:45:01 PM all 38.38 0.79 3.43 0.02 0.00 57.39 01:55:01 PM all 30.65 0.61 2.23 0.03 0.00 66.49 02:05:01 PM all 29.17 0.58 2.10 0.03 0.00 68.12
sar -d # disk
DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 12:05:01 PM dev8-0 1.37 0.00 35.94 26.14 0.00 3.09 1.98 0.27 12:15:01 PM dev8-0 1.65 0.00 39.89 24.23 0.00 2.96 1.98 0.33 12:25:01 PM dev8-0 1.26 0.00 33.39 26.57 0.00 2.89 1.79 0.22 12:35:01 PM dev8-0 1.33 0.00 35.23 26.52 0.00 3.15 1.82 0.24 12:45:01 PM dev8-0 1.68 0.00 42.31 25.23 0.00 2.95 1.89 0.32 12:55:01 PM dev8-0 1.44 0.00 35.76 24.86 0.00 3.20 1.88 0.27 01:05:01 PM dev8-0 1.43 0.00 35.57 24.93 0.00 2.17 1.46 0.21 01:15:01 PM dev8-0 1.74 0.00 43.13 24.74 0.01 3.88 2.15 0.37 01:25:01 PM dev8-0 1.39 0.00 35.36 25.44 0.01 3.65 2.42 0.34 01:35:01 PM dev8-0 1.32 0.00 33.74 25.65 0.00 3.39 2.09 0.28 01:45:01 PM dev8-0 1.48 0.00 37.20 25.20 0.01 3.92 2.26 0.33 01:55:01 PM dev8-0 1.62 0.00 39.36 24.35 0.01 3.27 1.70 0.27 02:05:01 PM dev8-0 1.42 0.00 34.72 24.51 0.00 3.28 2.13 0.30
sar -n # network
IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s 12:05:01 PM eth0 365.52 359.86 236.91 227.35 0.00 0.00 0.00 12:15:01 PM eth0 344.55 337.10 221.20 206.47 0.00 0.00 0.00 12:25:01 PM eth0 357.81 352.76 229.83 216.22 0.00 0.00 0.00 12:35:01 PM eth0 372.62 366.34 239.95 227.99 0.00 0.00 0.00 12:45:01 PM eth0 388.65 378.51 252.11 235.81 0.00 0.00 0.00 12:55:01 PM eth0 364.50 359.19 233.63 222.82 0.00 0.00 0.00 01:05:01 PM eth0 361.08 353.88 231.75 218.89 0.00 0.00 0.00 01:15:01 PM eth0 370.41 363.19 240.53 224.16 0.00 0.00 0.00 01:25:01 PM eth0 357.67 352.20 230.37 213.57 0.00 0.00 0.00 01:35:01 PM eth0 354.89 348.58 226.29 214.61 0.00 0.00 0.00 01:45:01 PM eth0 355.49 344.98 228.41 211.27 0.00 0.00 0.00 01:55:01 PM eth0 335.96 331.13 213.85 204.26 0.00 0.00 0.00 02:05:01 PM eth0 323.03 314.49 208.12 194.81 0.00 0.00 0.00
sar -w # context switches
proc/s cswch/s 12:05:01 PM 0.97 2382.38 12:15:01 PM 2.58 2415.16 12:25:01 PM 0.84 2406.79 12:35:01 PM 0.84 2371.04 12:45:01 PM 2.70 2414.09 12:55:01 PM 0.84 2385.57 01:05:01 PM 1.20 2419.94 01:15:01 PM 2.57 2387.75 01:25:01 PM 0.85 2164.65 01:35:01 PM 0.84 2156.29 01:45:01 PM 2.53 2251.43 01:55:01 PM 1.01 2331.93 02:05:01 PM 0.96 2323.19
sar -B # paging
pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff 12:05:01 PM 0.00 17.97 549.43 0.00 289.21 0.00 0.00 0.00 0.00 12:15:01 PM 0.00 19.95 1179.08 0.00 405.61 0.00 0.00 0.00 0.00 12:25:01 PM 0.00 16.69 456.71 0.00 217.63 0.00 0.00 0.00 0.00 12:35:01 PM 0.00 17.61 480.42 0.00 240.01 0.00 0.00 0.00 0.00 12:45:01 PM 0.00 21.15 1210.09 0.00 424.96 0.00 0.00 0.00 0.00 12:55:01 PM 0.00 17.88 489.83 0.00 256.39 0.00 0.00 0.00 0.00 01:05:01 PM 0.00 17.79 624.89 0.00 387.26 0.00 0.00 0.00 0.00 01:15:01 PM 0.00 21.57 1168.87 0.00 393.34 0.00 0.00 0.00 0.00 01:25:01 PM 0.00 17.68 466.03 0.00 235.07 0.00 0.00 0.00 0.00 01:35:01 PM 0.00 16.87 435.24 0.00 199.43 0.00 0.00 0.00 0.00 01:45:01 PM 0.00 18.60 1125.69 0.00 432.85 0.00 0.00 0.00 0.00 01:55:01 PM 0.00 19.68 596.62 0.00 272.75 0.00 0.00 0.00 0.00 02:05:01 PM 0.00 17.36 511.80 0.00 243.83 0.00 0.00 0.00 0.00
sar -r # memory
kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit 12:05:01 PM 1017364 3041608 74.94 225564 1773324 1194728 16.64 12:15:01 PM 1014992 3043980 74.99 225564 1777268 1193688 16.63 12:25:01 PM 1009504 3049468 75.13 225564 1781360 1194504 16.64 12:35:01 PM 999484 3059488 75.38 225564 1785652 1194520 16.64 12:45:01 PM 994764 3064208 75.49 225564 1790136 1194864 16.65 12:55:01 PM 993772 3065200 75.52 225564 1794288 1194296 16.64 01:05:01 PM 993868 3065104 75.51 225564 1798584 1193428 16.63 01:15:01 PM 985016 3073956 75.73 225564 1802708 1194388 16.64 01:25:01 PM 992316 3066656 75.55 225564 1806804 1192996 16.62 01:35:01 PM 971732 3087240 76.06 225564 1810784 1194272 16.64 01:45:01 PM 968816 3090156 76.13 225564 1815036 1194556 16.64 01:55:01 PM 967968 3091004 76.15 225564 1818716 1194924 16.65 02:05:01 PM 966324 3092648 76.19 225564 1822452 1194516 16.64
ps aufx
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2 0.0 0.0 0 0 ? S Jan28 0:00 [kthreadd] root 3 0.0 0.0 0 0 ? S Jan28 0:01 \_ [migration/0] root 4 0.0 0.0 0 0 ? S Jan28 1:01 \_ [ksoftirqd/0] root 5 0.0 0.0 0 0 ? S Jan28 0:00 \_ [watchdog/0] root 6 0.0 0.0 0 0 ? S Jan28 0:01 \_ [migration/1] root 7 0.0 0.0 0 0 ? S Jan28 0:27 \_ [ksoftirqd/1] root 8 0.0 0.0 0 0 ? S Jan28 0:00 \_ [watchdog/1] root 9 0.0 0.0 0 0 ? S Jan28 0:37 \_ [events/0] root 10 0.0 0.0 0 0 ? S Jan28 0:33 \_ [events/1] root 11 0.0 0.0 0 0 ? S Jan28 0:00 \_ [cpuset] root 12 0.0 0.0 0 0 ? S Jan28 0:00 \_ [khelper] root 13 0.0 0.0 0 0 ? S Jan28 0:00 \_ [async/mgr] root 14 0.0 0.0 0 0 ? S Jan28 0:00 \_ [pm] root 16 0.0 0.0 0 0 ? S Jan28 0:02 \_ [sync_supers] root 17 0.0 0.0 0 0 ? S Jan28 0:04 \_ [bdi-default] root 18 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kintegrityd/0] root 19 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kintegrityd/1] root 20 0.0 0.0 0 0 ? S Jan28 0:03 \_ [kblockd/0] root 21 0.0 0.0 0 0 ? S Jan28 0:12 \_ [kblockd/1] root 22 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kacpid] root 23 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kacpi_notify] root 24 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kacpi_hotplug] root 25 0.0 0.0 0 0 ? S Jan28 0:00 \_ [ata/0] root 26 0.0 0.0 0 0 ? S Jan28 0:00 \_ [ata/1] root 27 0.0 0.0 0 0 ? S Jan28 0:00 \_ [ata_aux] root 28 0.0 0.0 0 0 ? S Jan28 0:00 \_ [ksuspend_usbd] root 29 0.0 0.0 0 0 ? S Jan28 0:00 \_ [khubd] root 30 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kseriod] root 31 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kmmcd] root 34 0.0 0.0 0 0 ? S Jan28 0:00 \_ [khungtaskd] root 35 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kswapd0] root 36 0.0 0.0 0 0 ? SN Jan28 0:00 \_ [ksmd] root 37 0.0 0.0 0 0 ? S Jan28 0:00 \_ [aio/0] root 38 0.0 0.0 0 0 ? S Jan28 0:00 \_ [aio/1] root 39 0.0 0.0 0 0 ? S Jan28 0:00 \_ [ecryptfs-kthrea] root 40 0.0 0.0 0 0 ? S Jan28 0:00 \_ [crypto/0] root 41 0.0 0.0 0 0 ? S Jan28 0:00 \_ [crypto/1] root 44 0.0 0.0 0 0 ? S Jan28 0:00 \_ [pciehpd] root 45 0.0 0.0 0 0 ? S Jan28 0:00 \_ [scsi_eh_0] root 46 0.0 0.0 0 0 ? S Jan28 0:00 \_ [scsi_eh_1] root 47 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kstriped] root 50 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kmpathd/0] root 51 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kmpathd/1] root 52 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kmpath_handlerd] root 53 0.0 0.0 0 0 ? S Jan28 0:00 \_ [ksnapd] root 54 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kondemand/0] root 55 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kondemand/1] root 56 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kconservative/0] root 57 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kconservative/1] root 213 0.0 0.0 0 0 ? S Jan28 0:24 \_ [mpt_poll_0] root 274 0.0 0.0 0 0 ? S Jan28 0:00 \_ [mpt/0] root 295 0.0 0.0 0 0 ? S Jan28 0:00 \_ [scsi_eh_2] root 310 0.0 0.0 0 0 ? S Jan28 1:41 \_ [jbd2/sda1-8] root 311 0.0 0.0 0 0 ? S Jan28 0:00 \_ [ext4-dio-unwrit] root 312 0.0 0.0 0 0 ? S Jan28 0:00 \_ [ext4-dio-unwrit] root 342 0.0 0.0 0 0 ? S Jan28 0:54 \_ [flush-8:0] root 627 0.0 0.0 0 0 ? S Jan28 0:00 \_ [kpsmoused] root 18160 0.0 0.0 0 0 ? S Feb14 0:00 \_ [rpciod/0] root 18161 0.0 0.0 0 0 ? S Feb14 0:00 \_ [rpciod/1] root 18162 0.0 0.0 0 0 ? S Feb14 0:00 \_ [nfsiod] root 1 0.0 0.0 61824 2872 ? Ss Jan28 0:11 /sbin/init root 372 0.0 0.0 16904 860 ? S Jan28 0:00 upstart-udev-bridge --daemon root 375 0.0 0.0 17072 1012 ? S<s Jan28 0:00 udevd --daemon root 1054 0.0 0.0 16860 672 ? S< Jan28 0:00 \_ udevd --daemon root 18163 0.0 0.0 17068 832 ? S< Feb14 0:00 \_ udevd --daemon daemon 654 0.0 0.0 8256 644 ? Ss Jan28 0:00 portmap root 788 0.0 0.0 49260 2592 ? Ss Jan28 0:00 /usr/sbin/sshd -D root 8095 0.0 0.1 100888 4068 ? Ss 16:03 0:00 \_ sshd: root@pts/0 root 8157 0.0 0.0 11212 2084 pts/0 Ss 16:03 0:00 \_ -bash root 15777 0.0 0.0 7172 1084 pts/0 R+ 17:28 0:00 \_ ps aufx statd 808 0.0 0.0 10392 844 ? Ss Jan28 0:00 rpc.statd -L root 829 0.0 0.0 140 32 ? Ss Jan28 0:16 runsvdir -P /etc/service log: ..................................................................................................... root 834 0.0 0.0 116 32 ? Ss Jan28 0:00 \_ runsv chef-client root 838 0.0 0.0 136 48 ? S Jan28 0:00 \_ svlogd -tt ./main root 30898 0.2 1.8 192296 75736 ? S 01:57 2:25 \_ /usr/bin/ruby1.8 /usr/bin/chef-client -i 1800 -s 60 -L /var/log/chef/client.log root 832 0.0 0.0 6080 656 tty4 Ss+ Jan28 0:00 /sbin/getty -8 38400 tty4 root 841 0.0 0.0 6080 656 tty5 Ss+ Jan28 0:00 /sbin/getty -8 38400 tty5 root 844 0.0 0.0 6080 656 tty2 Ss+ Jan28 0:00 /sbin/getty -8 38400 tty2 root 845 0.0 0.0 6080 660 tty3 Ss+ Jan28 0:00 /sbin/getty -8 38400 tty3 root 847 0.0 0.0 6080 656 tty6 Ss+ Jan28 0:00 /sbin/getty -8 38400 tty6 root 849 0.0 0.0 21076 1044 ? Ss Jan28 0:04 cron daemon 853 0.0 0.0 18884 468 ? Ss Jan28 0:00 atd root 864 0.0 0.0 11284 640 ? Ss Jan28 2:10 /usr/sbin/irqbalance root 890 0.0 0.0 112412 1908 ? Ssl Jan28 5:09 /usr/sbin/automount root 908 0.0 0.0 28016 976 ? Ss Jan28 0:00 nginx: master process /usr/sbin/nginx www-data 910 0.0 0.0 64532 3064 ? S Jan28 0:00 \_ nginx: worker process root 922 0.0 0.0 169668 2584 ? Ssl Jan28 0:34 /usr/sbin/nscd mail 943 0.0 0.0 11888 648 ? S Jan28 0:00 /usr/sbin/nullmailer-send -d root 971 0.0 1.1 152036 46264 ? Sl Jan28 36:07 splunkd -p 8089 start root 972 0.0 0.0 49180 3512 ? Ss Jan28 0:00 \_ splunkd -p 8089 start root 1160 0.0 0.0 14888 1276 ? Ss Jan28 19:31 /usr/lib/vmware-tools/sbin64/vmware-guestd --background /var/run/vmware-guestd.pid ntp 1214 0.0 0.0 19700 1268 ? Ss Jan28 1:21 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -c /var/lib/ntp/ntp.conf.dhcp -u 103:107 root 1231 0.0 0.3 21164 12980 ? SLs Jan28 0:00 /usr/sbin/memlockd -u memlockd scs 1270 1.2 2.3 187788 96228 ? SNl Jan28 537:27 /usr/bin/ruby /opt/wp/roles/scs/src/dev/scs/bin/server.rb -p 8843 root 1309 0.0 0.0 6080 656 tty1 Ss+ Jan28 0:00 /sbin/getty -8 38400 tty1 dirsvc 27448 0.1 1.2 177408 50748 ? Sl Feb20 8:57 narwhal master --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.19/confi dirsvc 13003 2.5 1.2 180012 49128 ? Sl 16:57 0:47 \_ narwhal worker[1] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.1 dirsvc 13460 2.5 1.2 180108 49236 ? Sl 17:05 0:36 \_ narwhal worker[9] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.1 dirsvc 13637 2.4 1.2 180008 49096 ? Sl 17:08 0:29 \_ narwhal worker[3] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.1 dirsvc 13650 2.9 1.2 180172 49420 ? Sl 17:08 0:35 \_ narwhal worker[11] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1. dirsvc 13701 3.1 1.2 180172 49188 ? Sl 17:10 0:35 \_ narwhal worker[13] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1. dirsvc 13731 2.7 1.2 181556 50628 ? Sl 17:10 0:29 \_ narwhal worker[7] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.1 dirsvc 13770 2.8 1.2 179400 50352 ? Sl 17:11 0:29 \_ narwhal worker[8] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.1 dirsvc 13778 3.3 1.2 180104 49172 ? Sl 17:11 0:34 \_ narwhal worker[5] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.1 dirsvc 13826 2.6 1.2 181556 50672 ? Sl 17:12 0:25 \_ narwhal worker[0] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.1 dirsvc 13939 2.8 1.2 177948 48848 ? Sl 17:13 0:25 \_ narwhal worker[4] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.1 dirsvc 13971 3.2 1.4 189052 58292 ? Sl 17:13 0:28 \_ narwhal worker[12] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1. dirsvc 13982 2.5 1.2 177792 48780 ? Sl 17:14 0:22 \_ narwhal worker[6] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.1 dirsvc 15316 3.0 1.2 180072 49128 ? Sl 17:20 0:15 \_ narwhal worker[2] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.1 dirsvc 15381 2.0 1.2 179944 48928 ? Sl 17:21 0:08 \_ narwhal worker[14] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1. dirsvc 15743 3.5 1.1 177624 48596 ? Sl 17:28 0:00 \_ narwhal worker[10] --port 8862 -c /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1. dirsvc 27461 0.1 1.3 235884 54744 ? Sl Feb20 9:20 /opt/ruby-1.9.2/bin/ruby /opt/wp/roles/directory/src/dev/directory/vendor/bundle/ruby/1.9.1/gems/wp-directory-svc-2.1.19/gem-bin/wo root 11068 0.0 0.0 130480 1720 ? Sl 04:20 0:00 rsyslogd -c4 zabbix 18062 0.0 0.0 9908 728 ? SN 11:41 0:00 /usr/sbin/zabbix_agentd zabbix 18063 0.0 0.0 9908 756 ? SN 11:41 0:12 \_ /usr/sbin/zabbix_agentd zabbix 18064 0.0 0.0 9980 1044 ? SN 11:41 0:03 \_ /usr/sbin/zabbix_agentd zabbix 18065 0.0 0.0 9980 1044 ? SN 11:41 0:03 \_ /usr/sbin/zabbix_agentd zabbix 18066 0.0 0.0 9980 1044 ? SN 11:41 0:03 \_ /usr/sbin/zabbix_agentd zabbix 18067 0.0 0.0 9908 660 ? SN 11:41 0:00 \_ /usr/sbin/zabbix_agentd
EDIT: More info by request:
$ dpkg --get-selections | grep vmware vmware-open-vm-tools-common install vmware-open-vm-tools-kmod-2.6.32-32-server install $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5660 @ 2.80GHz stepping : 2 cpu MHz : 2800.099 cache size : 12288 KB fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat bogomips : 5600.19 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5660 @ 2.80GHz stepping : 2 cpu MHz : 2800.099 cache size : 12288 KB fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm ida arat bogomips : 5600.19 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
-
dbenhur over 11 yearsLinux includes processes in uninterpretable sleep in its load calculation. Such processes show with State 'D' in the usual process inspection tools. This state is usually used by device drivers waiting for disk or network IO. That "usual explanation" is true for Linux, but not most other unixes.
-
dbenhur over 11 yearsThanks for your response. What constitutes a "fractional time slice"? As I understand the scheduler, a process gets assigned to a cpu and runs on that cpu until the next scheduling interval or until it makes a blocking system call which causes it to yield. It my cpu is idle 70% of time but my run queue lengths average over 2, that puzzles me, why aren't these ready-to-run processes just getting scheduled to the mostly idle cpus?
-
dbenhur over 11 yearss/uninterpretable/uninterruptible/
-
dbenhur over 11 years
iotop
is boring, everything says approximately 0. -
dbenhur over 11 yearsI would add that this is a web service, though not a web server. It does have an execution profile akin to a bunch of other similar services we run: receive and deserialize a request, perform some dispatching to upstream services/databases, compute a result based on the answers from the upstreams, serialize a response, scribble a log msg. Median request duration ~60ms, 90% 200ms, 99% 500ms+. We have a bunch of other services with similar profiles running on comparable vm containers that do not exhibit this disconnect between Load and CPU%.
-
dbenhur over 11 yearsAny load avg above cpu count means I have more processes waiting to run than cpus to run them. I see many intervals above 2.0 and several la-5s over 4 and up to 6.5. This means I often have processes that are stalling for cpu behind other processes and implies unwanted latency for lack of cpu capacity. I normally expect load average and cpu% to correlate until the system starts approaching 100% cpu saturation; after that load average is the better signal because it indicates how over-committed the system is, not just that it's 100% busy.
-
Matt over 11 yearsLinux is only going as far as scheduling to a virtual cpu though, which ESX then schedules via it's own algorithms to a real cpu. How similar are the comparable vm's? Very similar CPU for different load? Same memory usage?
-
dbenhur over 11 years@mindthemonkey There's at least a couple dozen different services in the VMs. Some have substantially different profiles, but the majority look pretty similar to this service. 4GB mem, 2 virt cpus, modest IO (mostly network and basic logging), run 30-60% cpu utlization through the daily curve. IO and/or memory intensive nodes (DBs, SOLR) get dedicated hosts. Most of these other service vm's show the expected correlation between cpu% and load (at least as long as they stay healthily away from 100%).
-
dbenhur over 11 years@mindthemonkey while the guest scheduler is only controlling the virt cpu and ESX is scheduling in the larger context, I don't see how this substantially effects the cpu% and load accounting. Both are based on samples taken at some frequency and to the extent the guest is pre-empted by the hypervisor scheduling, that will effect both the slices where real work gets done and slices where the guest takes its samples.
-
Matt over 11 yearsActually, scratch that. It appears that kernel doesn't enumerate the stolen value so you will need to check for the "READY" counters in ESX for your vmm. Are these VM's ready values higher than others? Do they have any entitlements, shares, limits set lower in ESX?
-
billyw about 8 years@Matt I'm currently dealing with this same issue in a very similar environment. Nearly identical symptoms to dbenhur. I followed your advice to check for "READY" counters (stolen cycles). Unfortunately, it doesn't appear to be the culprit. Our high-load VM only has 1-3 cycles stolen per second, which is the same as our healthy VMs.
-
dbenhur almost 7 yearsThanks for taking the time to try to address a five year ago problem -- the systems and software in the question belong to a company I no longer work at and the VM stack and the service in question are no longer in operation there anyway. :) There's a bunch of CPU and IO info already posted in the original question. Brendan's public work and exposition on perf and flamegraphs postdate this question by more than a year.
-
dbenhur almost 7 yearsA comment earlier referenced some performance work from Brendan Gregg which reminded me that he also recently did some lovely archeology on how Uninterruptible Sleep wound up in Linux's Load Average
-
Mircea Vutcovici almost 7 yearsNo problem. Maybe it will be useful to someone.