Load average is 50 while CPU Utilization is %60

5,418

Please add sar -w 1 output. I suppose a number of context switches per second is killing your performance, because there are much more processes running than available processors. I think context switches on a virtual machine are expensive.

If it's true, then there are some kernel tunables that can help you lower number of context switches:

  • Check value of systctl kernel.sched_min_granularity_ns. Double it with a command similar to systctl kernel.sched_min_granularity_ns=2000000. Retest. Double it again. Retest. Repeat. Try to find a value which will not cripple interactivity too much but won't allow too many context switches and write it to /etc/sysctl.conf so it will be set at startup.

  • Set apache scheduling policy to SCHED_BATCH - start it with chrt -b 0 apache2

Share:
5,418

Related videos on Youtube

Roman Newaza
Author by

Roman Newaza

Updated on September 18, 2022

Comments

  • Roman Newaza
    Roman Newaza over 1 year

    We use EC2 Auto Scaling and recently decided to change Instance type from m2.2xlarge to c1.xlarge (High Memory to High CPU) because average amount of used RAM per Instance is 2G, thus we don't need 34G provided by m2.2xlarge, and having more CPU power of c1.xlarge for the same price would be good idea.

    But after switching to c1.xlarge, we have the issue:

    1. Load average became 50 while CPU Utilization dropped from %70 to %60.
    2. Scaling in from 6 Instances to 4 doesn't affect CPU Utilization Cloud Watch metric.
    3. Response time appeared to be very slow and Instances been substituting constantly with Auto Scaling because of ELB Health Check.
    4. Auto Scaling reduced the number of Instances from 8 to 4 because CPU Utilization dropped.

    Can you explain me what might be the reason of such behavior and what can I do with it?

    EC2 Instance Types Info:

    High-Memory Double Extra Large Instance

    34.2 GB of memory 13 EC2 Compute Units (4 virtual cores with 3.25 EC2 Compute Units each) 850 GB of instance storage 64-bit platform I/O Performance: High API name: m2.2xlarge

    High-CPU Extra Large Instance

    7 GB of memory 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High API name: c1.xlarge

    EDIT:

    $ iostat -x
    Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               1.34    0.00    0.13    0.02    0.29   98.23
    
    Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    xvdap1            0.04     0.09    0.08    0.13     1.50     0.87    22.99     0.01   36.59   23.42   44.75   4.04   0.08
    xvdb              0.00     0.00    0.01    0.00     0.03     0.00     9.37     0.00    1.04    0.95   15.00   1.04   0.00
    
    
    
    $ iostat
    Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)
    
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               1.45    0.00    0.14    0.02    0.31   98.08
    
    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    xvdap1            0.21         1.50         0.87      93689      54728
    xvdb              0.01         0.03         0.00       1575          8
    
    
    
    $ top
    top - 05:30:08 up 17:20,  3 users,  load average: 15.13, 10.24, 9.66
    Tasks: 166 total,  20 running, 146 sleeping,   0 stopped,   0 zombie
    Cpu(s): 65.3%us,  4.7%sy,  0.0%ni, 13.5%id,  0.0%wa,  0.0%hi,  0.7%si, 15.8%st
    Mem:   7130236k total,   463440k used,  6666796k free,    19100k buffers
    Swap:        0k total,        0k used,        0k free,    95136k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                          
     6457 ubuntu    20   0  257m  11m 4820 S   24  0.2   0:16.73 apache2                                                                                                                                                                          
     6416 ubuntu    20   0  257m  11m 4820 R   23  0.2   0:17.36 apache2                                                                                                                                                                          
     6375 ubuntu    20   0  257m  11m 4820 R   22  0.2   0:17.62 apache2                                                                                                                                                                          
     6402 ubuntu    20   0  257m  11m 4820 R   22  0.2   0:16.85 apache2                                                                                                                                                                          
     6472 ubuntu    20   0  257m  11m 4820 S   22  0.2   0:08.95 apache2                                                                                                                                                                          
     6311 ubuntu    20   0  257m  11m 4820 S   21  0.2   0:24.91 apache2                                                                                                                                                                          
     6446 ubuntu    20   0  257m  11m 4820 R   21  0.2   0:16.91 apache2                                                                                                                                                                          
     6372 ubuntu    20   0  257m  11m 4820 R   21  0.2   0:17.89 apache2                                                                                                                                                                          
     6460 ubuntu    20   0  257m  11m 4820 R   21  0.2   0:16.73 apache2                                                                                                                                                                          
     6379 ubuntu    20   0  257m  11m 4820 R   20  0.2   0:16.24 apache2                                                                                                                                                                          
     6380 ubuntu    20   0  257m  11m 4820 S   20  0.2   0:17.20 apache2                                                                                                                                                                          
     6450 ubuntu    20   0  257m  11m 4820 S   20  0.2   0:16.89 apache2                                                                                                                                                                          
     6426 ubuntu    20   0  257m  11m 4820 R   20  0.2   0:16.96 apache2                                                                                                                                                                          
     6432 ubuntu    20   0  257m  11m 4820 S   20  0.2   0:17.78 apache2                                                                                                                                                                          
     6433 ubuntu    20   0  257m  11m 4820 R   20  0.2   0:14.37 apache2                                                                                                                                                                          
     6476 ubuntu    20   0  257m  11m 4816 R   20  0.2   0:02.92 apache2                                                                                                                                                                          
     6386 ubuntu    20   0  257m  11m 4824 S   20  0.2   0:17.94 apache2                                                                                                                                                                          
     6475 ubuntu    20   0  257m  11m 4820 S   19  0.2   0:03.41 apache2                                                                                                                                                                          
     6355 ubuntu    20   0  257m  11m 4820 S   19  0.2   0:24.39 apache2                                                                                                                                                                          
     6417 ubuntu    20   0  257m  11m 4820 R   18  0.2   0:16.66 apache2                                                                                                                                                                          
     6455 ubuntu    20   0  257m  11m 4820 R   18  0.2   0:16.27 apache2                                                                                                                                                                          
     6393 ubuntu    20   0  257m  11m 4820 S   18  0.2   0:16.60 apache2                                                                                                                                                                          
     6325 ubuntu    20   0  257m  11m 4820 R   18  0.2   0:25.66 apache2                                                                                                                                                                          
     6403 ubuntu    20   0  257m  11m 4820 S   18  0.2   0:15.61 apache2                                                                                                                                                                          
     6474 ubuntu    20   0  257m  11m 4812 S   18  0.2   0:04.37 apache2                                                                                                                                                                          
     6477 ubuntu    20   0  257m  11m 4800 S   18  0.2   0:01.43 apache2                                                                                                                                                                          
     6315 ubuntu    20   0  257m  11m 4820 S   17  0.2   0:25.27 apache2                                                                                                                                                                          
     6376 ubuntu    20   0  257m  11m 4820 R   17  0.2   0:17.53 apache2                                                                                                                                                                          
     6478 ubuntu    20   0  257m  11m 4800 S   15  0.2   0:00.45 apache2                                                                                                                                                                          
     6359 ubuntu    20   0  257m  11m 4820 R   15  0.2   0:23.60 apache2   
    
    
    
    $ df -h
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/xvda1            7.9G  1.4G  6.1G  19% /
    none                  3.4G  112K  3.4G   1% /dev
    none                  3.4G     0  3.4G   0% /dev/shm
    none                  3.4G   72K  3.4G   1% /var/run
    none                  3.4G     0  3.4G   0% /var/lock
    /dev/xvdb             414G  199M  393G   1% /mnt
    XXXX.compute.internal:/share_0
                           99G   28G   66G  30% /data_0
    XXXX.compute.internal:/share_17
                           99G   30G   64G  33% /data_17
    XXXX.compute.internal:/share_13
                           99G   30G   64G  33% /data_13
    XXXX.compute.internal:/share_18
                           99G   31G   64G  33% /data_18
    XXXX.compute.internal:/share_15
                           99G   28G   66G  30% /data_15
    XXXX.compute.internal:/share_10
                           99G   28G   67G  30% /data_10
    XXXX.compute.internal:/share_16
                           99G   30G   64G  32% /data_16
    XXXX.internal:/share_3
                           99G   29G   66G  31% /data_3
    XXXX.compute.internal:/share_11
                           99G   30G   64G  32% /data_11
    XXXX.compute.internal:/share_7
                           99G   28G   66G  30% /data_7
    XXXX.compute.internal:/share
                           99G   58G   37G  62% /share
    XXXX.compute.internal:/share_2
                           99G   28G   66G  30% /data_2
    XXXX.compute.internal:/share_8
                           99G   28G   67G  30% /data_8
    XXXX.compute.internal:/share_19
                           99G   28G   66G  30% /data_19
    XXXX.compute.internal:/share_14
                           99G   31G   64G  33% /data_14
    XXXX.compute.internal:/share_5
                           99G   28G   66G  30% /data_5
    XXXX.compute.internal:/share_6
                           99G   28G   67G  30% /data_6
    XXXX.compute.internal:/share_1
                           99G   28G   66G  30% /data_1
    XXXX.compute.internal:/share_12
                           99G   31G   64G  33% /data_12
    XXXX.compute.internal:/share_4
                           99G   29G   66G  31% /data_4
    XXXX.compute.internal:/share_9
                           99G   28G   66G  30% /data_9
    
    
    
    $ free -g
                 total       used       free     shared    buffers     cached
    Mem:             6          0          6          0          0          0
    -/+ buffers/cache:          0          6
    Swap:            0          0          0
    
    
    
    sar 1
    Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)
    
    05:33:02 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    05:33:03 AM     all     69.27      0.00      5.90      0.00     13.83     11.00
    05:33:04 AM     all     70.88      0.00      7.62      0.00     16.50      5.01
    05:33:05 AM     all     64.41      0.00      5.35      0.00     17.90     12.34
    05:33:06 AM     all     66.41      0.00      9.16      0.00     13.09     11.34
    05:33:07 AM     all     74.55      0.00      7.06      0.00     11.21      7.17
    05:33:08 AM     all     62.31      0.00      7.49      0.00     13.38     16.81
    05:33:09 AM     all     73.65      0.00      5.61      0.00     16.04      4.70
    05:33:10 AM     all     76.79      0.00      8.20      0.00      9.70      5.31
    05:33:11 AM     all     70.91      0.00      5.86      0.00     14.21      9.02
    05:33:12 AM     all     73.95      0.00      6.37      0.00     12.51      7.17
    05:33:13 AM     all     63.50      0.00      6.03      0.00     17.52     12.95
    05:33:14 AM     all     61.92      0.00      4.42      0.00     17.66     16.00
    05:33:15 AM     all     63.56      0.00      6.42      0.00     15.11     14.91
    05:33:16 AM     all     72.63      0.00      7.51      0.00     14.90      4.97
    05:33:17 AM     all     60.68      0.00      6.17      0.00     15.09     18.06
    
    
    
    $ sar -w 1
        Linux 2.6.38-13-virtual     02/17/2012  _x86_64_    (8 CPU)
    
        09:34:23 AM    proc/s   cswch/s
        09:34:24 AM      0.00   4795.00
        09:34:25 AM      0.00   4174.00
        09:34:26 AM      0.00   4194.23
        09:34:27 AM      1.00   3645.00
        09:34:28 AM      0.00   4564.00
        09:34:29 AM      0.00   4473.00
        09:34:30 AM      0.00   4225.00
        09:34:31 AM      0.00   4064.36
        09:34:32 AM      0.00   4740.00
        09:34:33 AM      0.00   4589.22
        09:34:34 AM      0.00   3887.00
        09:34:35 AM      0.00   4579.00
        09:34:36 AM      0.00   4408.00
        09:34:37 AM      1.00   4390.00
        09:34:38 AM      0.00   4628.00
    
    • thinice
      thinice about 12 years
      How are we supposed to help without telling us what's taking up your CPU?
    • EEAA
      EEAA about 12 years
      20 USD says it's iowait.
    • cyberx86
      cyberx86 about 12 years
      You need to provide a lot more information for a proper diagnosis - but as a starting point, keep in mind that Load Average, is more than just CPU - it includes iowait time. My guess is that the excess RAM you had before allowed for significantly more disk caching, which minimized disk I/O. Check and post the output of iostat -x (the %iowait and await values) and/or top (the %wa value) and to prove/disprove. Also post more detail: EBS volumes and setup (e.g. RAID) or ephemeral; df, iostat, top, free, etc (some sar data, logs, etc may be helpful as well),
    • Roman Newaza
      Roman Newaza about 12 years
      We don't use RAID with this Group. Instance type is EBS Boot. I have started test Instance and used ab to simulate load. Please look at my post - it's been edited.
    • Tometzky
      Tometzky about 12 years
      @ErikA: OK - where can I collect my 20 USD? ;-P
    • Roman Newaza
      Roman Newaza about 12 years
      I would better buy you a beer ;P. Load Average is high sometimes.
    • Roman Newaza
      Roman Newaza about 12 years
      Basically, when there're 8 Instances in the group, Load Average is ~0.5, but when I scale it in to 6 Instances, Load Average might rise to ~20.
  • Roman Newaza
    Roman Newaza about 12 years
    Does it mean EC2 is overloaded?
  • Roman Newaza
    Roman Newaza about 12 years
    And how to persist SCHED_BATCH batch policy?
  • Tometzky
    Tometzky about 12 years
    This SCHED_BATCH tells Linux kernel that this process is not interactive - so that it is better to give it longer slice of CPU time less often, and 0 is the priority - a default one. You'd have to add this to a startup script which starts Apache. I don't know how do you start your server on your system so I can not help here much. On CentOS/RedHat servers I'd add HTTPD="chrt -b 0 /usr/sbin/httpd" to /etc/sysconfig/httpd.
  • enedebe
    enedebe about 12 years
    No, you're instance is overloaded and hypervisor acts as "policeman" stealing required cpu cycles.