What is causing the unusual high load average?

10,074

Solution 1

From the look of things, the high CPU usage was probably caused by a Wordpress plugin, Google XML Sitemap generator. After disabling that, the CPU averages mostly went down. Still, auditing the plugins to remove any that might use too much CPU.

Solution 2

To see which process is causing high I/O, you can use dstat or iotop.

To see why php is crashing, when starting apache make sure that you run ulimit -c unlimited. Next time when php will crash you will have a core dump. Use bt command inside gdb to have a stack trace of the crash. E.g.

gdb /usr/bin/httpd core
(gdb) bt

See: http://www.network-theory.co.uk/articles/gccdebug.html

Solution 3

I would expect the server has memory problems. This will result in the container having to wait for data to come from disk rather than buffers. If you have access to the server, try running vmstat on it rather than running it in the container. Memory management for virtual servers is dependent on the host server. The first thing I notices was no buffers, cache, or swap in your data. These are all important to performance.

Share:
10,074

Related videos on Youtube

James
Author by

James

Updated on September 18, 2022

Comments

  • James
    James almost 2 years

    I noticed on Tuesday night of last week, the load average went up sharply and it seemed abnormal since the traffic is small. Usually, the numbers usually average around .40 or lower and my server stuff (mysql, php and apache) are optimized. I noticed that the IOWait is unusually high even though the processes is barely using any CPU.

    top - 01:44:39 up 1 day, 21:13,  1 user,  load average: 1.41, 1.09, 0.86
    Tasks:  60 total,   1 running,  59 sleeping,   0 stopped,   0 zombie
    Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu2  :  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
    Cpu7  :  0.0%us,  0.0%sy,  0.0%ni, 91.5%id,  8.5%wa,  0.0%hi,  0.0%si,  0.0%st
    Mem:   1048576k total,   331944k used,   716632k free,        0k buffers
    Swap:        0k total,        0k used,        0k free,        0k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
        1 root      15   0  2468 1376 1140 S    0  0.1   0:00.92 init               
     1656 root      15   0 13652 5212  664 S    0  0.5   0:00.00 apache2            
     9323 root      18   0 13652 5212  664 S    0  0.5   0:00.00 apache2            
    10079 root      18   0  3972 1248  972 S    0  0.1   0:00.00 su                 
    10080 root      15   0  4612 1956 1448 S    0  0.2   0:00.01 bash               
    11298 root      15   0 13652 5212  664 S    0  0.5   0:00.00 apache2            
    11778 chikorit  15   0  2344 1092  884 S    0  0.1   0:00.05 top                
    15384 root      18   0 17544  13m 1568 S    0  1.3   0:02.28 miniserv.pl        
    15585 root      15   0  8280 2736 2168 S    0  0.3   0:00.02 sshd               
    15608 chikorit  15   0  8280 1436  860 S    0  0.1   0:00.02 sshd      

    Here is the VMStat

    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
     r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
     1  0      0 768644      0      0    0    0    14    23    0   10  1  0 99  0

    IOStat - Nothing unusal

    Total DISK READ: 67.13 K/s | Total DISK WRITE: 0.00 B/s
      TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND          
    19496 be/4 chikorit   11.85 K/s    0.00 B/s  0.00 %  0.00 % apache2 -k start
    19501 be/4 mysql       3.95 K/s    0.00 B/s  0.00 %  0.00 % mysqld
    19568 be/4 chikorit   11.85 K/s    0.00 B/s  0.00 %  0.00 % apache2 -k start
    19569 be/4 chikorit   11.85 K/s    0.00 B/s  0.00 %  0.00 % apache2 -k start
    19570 be/4 chikorit   11.85 K/s    0.00 B/s  0.00 %  0.00 % apache2 -k start
    19571 be/4 chikorit    7.90 K/s    0.00 B/s  0.00 %  0.00 % apache2 -k start
    19573 be/4 chikorit    7.90 K/s    0.00 B/s  0.00 %  0.00 % apache2 -k start
        1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
    11778 be/4 chikorit    0.00 B/s    0.00 B/s  0.00 %  0.00 % top
    19470 be/4 mysql       0.00 B/s    0.00 B/s  0.00 %  0.00 % mysqld
    

    Load Average Chart - http://i.stack.imgur.com/kYsD0.png

    I want to be sure if this is not a MySQL problem before making sure. Also, this is a Ubuntu 10.04 LTS Server on OpenVZ.

    Edit: This will probably give a good picture on the IO Wait

    top - 22:12:22 up 17:41, 1 user, load average: 1.10, 1.09, 0.93
    Tasks: 33 total, 1 running, 32 sleeping, 0 stopped, 0 zombie
    Cpu(s): 0.6%us, 0.2%sy, 0.0%ni, 89.0%id, 10.1%wa, 0.0%hi, 0.0%si, 0.0%st
    Mem: 1048576k total, 260708k used, 787868k free, 0k buffers
    Swap: 0k total, 0k used, 0k free, 0k cached
    
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
    1 root 15 0 2468 1376 1140 S 0 0.1 0:00.88 init 
    5849 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 
    8063 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 
    9732 root 16 0 8280 2728 2168 S 0 0.3 0:00.02 sshd 
    9746 chikorit 18 0 8412 1444 864 S 0 0.1 0:01.10 sshd 
    9747 chikorit 18 0 4576 1960 1488 S 0 0.2 0:00.24 bash 
    13706 chikorit 15 0 2344 1088 884 R 0 0.1 0:00.03 top 
    15745 chikorit 15 0 12968 5108 1280 S 0 0.5 0:00.00 apache2 
    15751 chikorit 15 0 72184 25m 18m S 0 2.5 0:00.37 php5-fpm 
    15790 chikorit 18 0 12472 4640 1192 S 0 0.4 0:00.00 apache2 
    15797 chikorit 15 0 72888 23m 16m S 0 2.3 0:00.06 php5-fpm 
    16038 root 15 0 67772 2848 592 D 0 0.3 0:00.00 php5-fpm 
    16309 syslog 18 0 24084 1316 992 S 0 0.1 0:00.07 rsyslogd 
    16316 root 15 0 5472 908 500 S 0 0.1 0:00.00 sshd 
    16326 root 15 0 2304 908 712 S 0 0.1 0:00.02 cron 
    17464 root 15 0 10252 7560 856 D 0 0.7 0:01.88 psad 
    17466 root 18 0 1684 276 208 S 0 0.0 0:00.31 psadwatchd 
    17559 root 18 0 11444 2020 732 S 0 0.2 0:00.47 sendmail-mta 
    17688 root 15 0 10252 5388 1136 S 0 0.5 0:03.81 python 
    17752 teamspea 19 0 44648 7308 4676 S 0 0.7 1:09.70 ts3server_linux 
    18098 root 15 0 12336 6380 3032 S 0 0.6 0:00.47 apache2 
    18099 chikorit 18 0 10368 2536 464 S 0 0.2 0:00.00 apache2 
    18120 ntp 15 0 4336 1316 984 S 0 0.1 0:00.87 ntpd 
    18379 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 
    18387 mysql 15 0 62796 36m 5864 S 0 3.6 1:43.26 mysqld 
    19584 root 15 0 12336 4028 668 S 0 0.4 0:00.02 apache2 
    22498 root 16 0 12336 4028 668 S 0 0.4 0:00.00 apache2 
    24260 root 15 0 67772 3612 1356 S 0 0.3 0:00.22 php5-fpm 
    27712 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 
    27730 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 
    30343 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 
    30366 root 15 0 12336 4028 668 S 0 0.4 0:00.00 apache2 
    

    This is the free ram as of today.

                 total       used       free     shared    buffers     cached
    Mem:          1024        302        721          0          0          0
    -/+ buffers/cache:        302        721
    Swap:            0          0          0
    

    Update: Looking into the logs, particularly the PHP5-FPM, which is causing the CPU spike. I found that its segment faulting for some apparent reason.

    [03-Jun-2012 06:11:20] NOTICE: [pool www] child 14132 started
    [03-Jun-2012 06:11:25] WARNING: [pool www] child 13664 exited on signal 11 (SIGSEGV) after 53.686322 seconds from start
    [03-Jun-2012 06:11:25] NOTICE: [pool www] child 14328 started
    [03-Jun-2012 06:11:25] WARNING: [pool www] child 14132 exited on signal 11 (SIGSEGV) after 4.708681 seconds from start
    [03-Jun-2012 06:11:25] NOTICE: [pool www] child 14329 started
    [03-Jun-2012 06:11:58] WARNING: [pool www] child 14328 exited on signal 11 (SIGSEGV) after 32.981228 seconds from start
    [03-Jun-2012 06:11:58] NOTICE: [pool www] child 15745 started
    [03-Jun-2012 06:12:25] WARNING: [pool www] child 15745 exited on signal 11 (SIGSEGV) after 27.442864 seconds from start
    [03-Jun-2012 06:12:25] NOTICE: [pool www] child 17446 started
    [03-Jun-2012 06:12:25] WARNING: [pool www] child 14329 exited on signal 11 (SIGSEGV) after 60.411278 seconds from start
    [03-Jun-2012 06:12:25] NOTICE: [pool www] child 17447 started
    [03-Jun-2012 06:13:02] WARNING: [pool www] child 17446 exited on signal 11 (SIGSEGV) after 36.746793 seconds from start
    [03-Jun-2012 06:13:02] NOTICE: [pool www] child 18133 started
    [03-Jun-2012 06:13:48] WARNING: [pool www] child 17447 exited on signal 11 (SIGSEGV) after 82.710107 seconds from start

    I'm thinking that this might be causing the problem. If that is the cause, probably switching it off that to fastcgi/fcgid might resolve it... but still, I want to see if something else might be causing it to do this.

    • dmourati
      dmourati about 12 years
      Please state your question.
    • Tom O'Connor
      Tom O'Connor about 12 years
      10% IOWait ain't high.
  • Tom O'Connor
    Tom O'Connor about 12 years
    You've got 1GB of ram, and 12 Apache, 4 php-fpm, and a mysqld in your process tree. (see what I did there?). - You need some more RAM. I bet this is a VPS, and it's really cheap, right?
  • Tom O'Connor
    Tom O'Connor about 12 years
    Oh look. No SWAP. Well there's another problem.
  • Tom O'Connor
    Tom O'Connor about 12 years
    I hope the OP is using PHP APC, otherwise I'll just have to laugh myself to death.
  • James
    James about 12 years
    I'm using Xcache as APC causes problems with WP Super Cache. Also, OpenVZ vps don't come with swap since its not full virtualization. Only Xen and KVM supports it.
  • BillThor
    BillThor about 12 years
    Performance issue indicates that you may have used all available memory. On a virtual server (especially a partially virtualized server) you need to check the global zone as well. Either can cause problems.