Load average is 50 while CPU Utilization is %60

5,418

Please add sar -w 1 output. I suppose a number of context switches per second is killing your performance, because there are much more processes running than available processors. I think context switches on a virtual machine are expensive.

If it's true, then there are some kernel tunables that can help you lower number of context switches:

Check value of systctl kernel.sched_min_granularity_ns. Double it with a command similar to systctl kernel.sched_min_granularity_ns=2000000. Retest. Double it again. Retest. Repeat. Try to find a value which will not cripple interactivity too much but won't allow too many context switches and write it to /etc/sysctl.conf so it will be set at startup.
Set apache scheduling policy to SCHED_BATCH - start it with chrt -b 0 apache2

5,418

Roman Newaza

Updated on September 18, 2022

Comments

Roman Newaza over 1 year

We use EC2 Auto Scaling and recently decided to change Instance type from m2.2xlarge to c1.xlarge (High Memory to High CPU) because average amount of used RAM per Instance is 2G, thus we don't need 34G provided by m2.2xlarge, and having more CPU power of c1.xlarge for the same price would be good idea.

But after switching to c1.xlarge, we have the issue:

Load average became 50 while CPU Utilization dropped from %70 to %60.
Scaling in from 6 Instances to 4 doesn't affect CPU Utilization Cloud Watch metric.
Response time appeared to be very slow and Instances been substituting constantly with Auto Scaling because of ELB Health Check.
Auto Scaling reduced the number of Instances from 8 to 4 because CPU Utilization dropped.

Can you explain me what might be the reason of such behavior and what can I do with it?

EC2 Instance Types Info:

High-Memory Double Extra Large Instance

34.2 GB of memory 13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each) 850 GB of instance storage 64-bit platform I/O Performance: High API name: m2.2xlarge

High-CPU Extra Large Instance

7 GB of memory 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High API name: c1.xlarge

EDIT:

$ iostat -x
Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.34    0.00    0.13    0.02    0.29   98.23

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.04     0.09    0.08    0.13     1.50     0.87    22.99     0.01   36.59   23.42   44.75   4.04   0.08
xvdb              0.00     0.00    0.01    0.00     0.03     0.00     9.37     0.00    1.04    0.95   15.00   1.04   0.00



$ iostat
Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.45    0.00    0.14    0.02    0.31   98.08

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvdap1            0.21         1.50         0.87      93689      54728
xvdb              0.01         0.03         0.00       1575          8



$ top
top - 05:30:08 up 17:20,  3 users,  load average: 15.13, 10.24, 9.66
Tasks: 166 total,  20 running, 146 sleeping,   0 stopped,   0 zombie
Cpu(s): 65.3%us,  4.7%sy,  0.0%ni, 13.5%id,  0.0%wa,  0.0%hi,  0.7%si, 15.8%st
Mem:   7130236k total,   463440k used,  6666796k free,    19100k buffers
Swap:        0k total,        0k used,        0k free,    95136k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                          
 6457 ubuntu    20   0  257m  11m 4820 S   24  0.2   0:16.73 apache2                                                                                                                                                                          
 6416 ubuntu    20   0  257m  11m 4820 R   23  0.2   0:17.36 apache2                                                                                                                                                                          
 6375 ubuntu    20   0  257m  11m 4820 R   22  0.2   0:17.62 apache2                                                                                                                                                                          
 6402 ubuntu    20   0  257m  11m 4820 R   22  0.2   0:16.85 apache2                                                                                                                                                                          
 6472 ubuntu    20   0  257m  11m 4820 S   22  0.2   0:08.95 apache2                                                                                                                                                                          
 6311 ubuntu    20   0  257m  11m 4820 S   21  0.2   0:24.91 apache2                                                                                                                                                                          
 6446 ubuntu    20   0  257m  11m 4820 R   21  0.2   0:16.91 apache2                                                                                                                                                                          
 6372 ubuntu    20   0  257m  11m 4820 R   21  0.2   0:17.89 apache2                                                                                                                                                                          
 6460 ubuntu    20   0  257m  11m 4820 R   21  0.2   0:16.73 apache2                                                                                                                                                                          
 6379 ubuntu    20   0  257m  11m 4820 R   20  0.2   0:16.24 apache2                                                                                                                                                                          
 6380 ubuntu    20   0  257m  11m 4820 S   20  0.2   0:17.20 apache2                                                                                                                                                                          
 6450 ubuntu    20   0  257m  11m 4820 S   20  0.2   0:16.89 apache2                                                                                                                                                                          
 6426 ubuntu    20   0  257m  11m 4820 R   20  0.2   0:16.96 apache2                                                                                                                                                                          
 6432 ubuntu    20   0  257m  11m 4820 S   20  0.2   0:17.78 apache2                                                                                                                                                                          
 6433 ubuntu    20   0  257m  11m 4820 R   20  0.2   0:14.37 apache2                                                                                                                                                                          
 6476 ubuntu    20   0  257m  11m 4816 R   20  0.2   0:02.92 apache2                                                                                                                                                                          
 6386 ubuntu    20   0  257m  11m 4824 S   20  0.2   0:17.94 apache2                                                                                                                                                                          
 6475 ubuntu    20   0  257m  11m 4820 S   19  0.2   0:03.41 apache2                                                                                                                                                                          
 6355 ubuntu    20   0  257m  11m 4820 S   19  0.2   0:24.39 apache2                                                                                                                                                                          
 6417 ubuntu    20   0  257m  11m 4820 R   18  0.2   0:16.66 apache2                                                                                                                                                                          
 6455 ubuntu    20   0  257m  11m 4820 R   18  0.2   0:16.27 apache2                                                                                                                                                                          
 6393 ubuntu    20   0  257m  11m 4820 S   18  0.2   0:16.60 apache2                                                                                                                                                                          
 6325 ubuntu    20   0  257m  11m 4820 R   18  0.2   0:25.66 apache2                                                                                                                                                                          
 6403 ubuntu    20   0  257m  11m 4820 S   18  0.2   0:15.61 apache2                                                                                                                                                                          
 6474 ubuntu    20   0  257m  11m 4812 S   18  0.2   0:04.37 apache2                                                                                                                                                                          
 6477 ubuntu    20   0  257m  11m 4800 S   18  0.2   0:01.43 apache2                                                                                                                                                                          
 6315 ubuntu    20   0  257m  11m 4820 S   17  0.2   0:25.27 apache2                                                                                                                                                                          
 6376 ubuntu    20   0  257m  11m 4820 R   17  0.2   0:17.53 apache2                                                                                                                                                                          
 6478 ubuntu    20   0  257m  11m 4800 S   15  0.2   0:00.45 apache2                                                                                                                                                                          
 6359 ubuntu    20   0  257m  11m 4820 R   15  0.2   0:23.60 apache2   



$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/xvda1            7.9G  1.4G  6.1G  19% /
none                  3.4G  112K  3.4G   1% /dev
none                  3.4G     0  3.4G   0% /dev/shm
none                  3.4G   72K  3.4G   1% /var/run
none                  3.4G     0  3.4G   0% /var/lock
/dev/xvdb             414G  199M  393G   1% /mnt
XXXX.compute.internal:/share_0
                       99G   28G   66G  30% /data_0
XXXX.compute.internal:/share_17
                       99G   30G   64G  33% /data_17
XXXX.compute.internal:/share_13
                       99G   30G   64G  33% /data_13
XXXX.compute.internal:/share_18
                       99G   31G   64G  33% /data_18
XXXX.compute.internal:/share_15
                       99G   28G   66G  30% /data_15
XXXX.compute.internal:/share_10
                       99G   28G   67G  30% /data_10
XXXX.compute.internal:/share_16
                       99G   30G   64G  32% /data_16
XXXX.internal:/share_3
                       99G   29G   66G  31% /data_3
XXXX.compute.internal:/share_11
                       99G   30G   64G  32% /data_11
XXXX.compute.internal:/share_7
                       99G   28G   66G  30% /data_7
XXXX.compute.internal:/share
                       99G   58G   37G  62% /share
XXXX.compute.internal:/share_2
                       99G   28G   66G  30% /data_2
XXXX.compute.internal:/share_8
                       99G   28G   67G  30% /data_8
XXXX.compute.internal:/share_19
                       99G   28G   66G  30% /data_19
XXXX.compute.internal:/share_14
                       99G   31G   64G  33% /data_14
XXXX.compute.internal:/share_5
                       99G   28G   66G  30% /data_5
XXXX.compute.internal:/share_6
                       99G   28G   67G  30% /data_6
XXXX.compute.internal:/share_1
                       99G   28G   66G  30% /data_1
XXXX.compute.internal:/share_12
                       99G   31G   64G  33% /data_12
XXXX.compute.internal:/share_4
                       99G   29G   66G  31% /data_4
XXXX.compute.internal:/share_9
                       99G   28G   66G  30% /data_9



$ free -g
             total       used       free     shared    buffers     cached
Mem:             6          0          6          0          0          0
-/+ buffers/cache:          0          6
Swap:            0          0          0



sar 1
Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)

05:33:02 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
05:33:03 AM     all     69.27      0.00      5.90      0.00     13.83     11.00
05:33:04 AM     all     70.88      0.00      7.62      0.00     16.50      5.01
05:33:05 AM     all     64.41      0.00      5.35      0.00     17.90     12.34
05:33:06 AM     all     66.41      0.00      9.16      0.00     13.09     11.34
05:33:07 AM     all     74.55      0.00      7.06      0.00     11.21      7.17
05:33:08 AM     all     62.31      0.00      7.49      0.00     13.38     16.81
05:33:09 AM     all     73.65      0.00      5.61      0.00     16.04      4.70
05:33:10 AM     all     76.79      0.00      8.20      0.00      9.70      5.31
05:33:11 AM     all     70.91      0.00      5.86      0.00     14.21      9.02
05:33:12 AM     all     73.95      0.00      6.37      0.00     12.51      7.17
05:33:13 AM     all     63.50      0.00      6.03      0.00     17.52     12.95
05:33:14 AM     all     61.92      0.00      4.42      0.00     17.66     16.00
05:33:15 AM     all     63.56      0.00      6.42      0.00     15.11     14.91
05:33:16 AM     all     72.63      0.00      7.51      0.00     14.90      4.97
05:33:17 AM     all     60.68      0.00      6.17      0.00     15.09     18.06



$ sar -w 1
    Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)

    09:34:23 AM    proc/s   cswch/s
    09:34:24 AM      0.00   4795.00
    09:34:25 AM      0.00   4174.00
    09:34:26 AM      0.00   4194.23
    09:34:27 AM      1.00   3645.00
    09:34:28 AM      0.00   4564.00
    09:34:29 AM      0.00   4473.00
    09:34:30 AM      0.00   4225.00
    09:34:31 AM      0.00   4064.36
    09:34:32 AM      0.00   4740.00
    09:34:33 AM      0.00   4589.22
    09:34:34 AM      0.00   3887.00
    09:34:35 AM      0.00   4579.00
    09:34:36 AM      0.00   4408.00
    09:34:37 AM      1.00   4390.00
    09:34:38 AM      0.00   4628.00

thinice about 12 years

How are we supposed to help without telling us what's taking up your CPU?
EEAA about 12 years

20 USD says it's iowait.
cyberx86 about 12 years

You need to provide a lot more information for a proper diagnosis - but as a starting point, keep in mind that Load Average, is more than just CPU - it includes iowait time. My guess is that the excess RAM you had before allowed for significantly more disk caching, which minimized disk I/O. Check and post the output of iostat -x (the %iowait and await values) and/or top (the %wa value) and to prove/disprove. Also post more detail: EBS volumes and setup (e.g. RAID) or ephemeral; df, iostat, top, free, etc (some sar data, logs, etc may be helpful as well),
Roman Newaza about 12 years

We don't use RAID with this Group. Instance type is EBS Boot. I have started test Instance and used ab to simulate load. Please look at my post - it's been edited.
Tometzky about 12 years

@ErikA: OK - where can I collect my 20 USD? ;-P
Roman Newaza about 12 years

I would better buy you a beer ;P. Load Average is high sometimes.
Roman Newaza about 12 years

Basically, when there're 8 Instances in the group, Load Average is ~0.5, but when I scale it in to 6 Instances, Load Average might rise to ~20.

Roman Newaza about 12 years

Does it mean EC2 is overloaded?
Roman Newaza about 12 years

And how to persist SCHED_BATCH batch policy?
Tometzky about 12 years

This SCHED_BATCH tells Linux kernel that this process is not interactive - so that it is better to give it longer slice of CPU time less often, and 0 is the priority - a default one. You'd have to add this to a startup script which starts Apache. I don't know how do you start your server on your system so I can not help here much. On CentOS/RedHat servers I'd add HTTPD="chrt -b 0 /usr/sbin/httpd" to /etc/sysconfig/httpd.
enedebe about 12 years

No, you're instance is overloaded and hypervisor acts as "policeman" stealing required cpu cycles.