Load average is high but resources are not fully used
Solution 1
You might want to enable Apache's mod_status ( http://httpd.apache.org/docs/2.0/mod/mod_status.html ) so you can see exactly what's happening inside your webserver. Specifically, you'll be get numbers on per-request CPU consumption.
A few snapshots from vmstat/iostat wouldn't hurt, either.
Also, are you using MyISAM or InnoDB tables? When you get one of these load spikes, what do you get from "SHOW FULL PROCESSLIST\G" in MySQL? I have a feeling you're getting lock/query contention in MySQL which is blowing up the length of your kernel run queue.
Solution 2
I don't have a full solution for you, but I have some guesses.
- Your mysql server seems to have only something like 128MB pool. If the LAMP system makes use of a fair-sized database, this seems to be on the low side. That would generate a lot of I/O towards disks. Also, if there are CPU spikes on mysql, turn on slow-query logging for a bit and see what shows up. A new index or two might be in order.
- For a top replacement that can read most of the per-process conuters in a modern kernel, I recommend atop. Among other things, it can show disk access by process. Note that atop has a running daemon as part of its setup, so you may want to uninstall it after you are done.
- Be careful what CPU usage numbers you trust. They are generated using somewhat different methods. In my experience, to show overall CPU usage, vmstat gives the "best" (== closest to perceived load) numbers.
- There are apache processes doing serious work. Perhaps some PHP code optimization is in order?
However, it is not obvious to me from above data that there is much wrong with your setup. While you can probably wring a bit more performance out of the box, you may simply be approaching the limit.
Update:
Clarification re: comment below.
A typical network-oriented TCP server consists of a daemon that has a listening socket and a number of open connections to clients. Each of these sockets has a process waiting on it (one process may wait on numerous sockets). Those processes will be in sleeping state and will be woken up by the OS when some data arrives. If it is efficient (say static web server) you may never catch it running, as it takes only some 100 microseconds to wake up, serve some data and go back to sleep.
Update 2:
A modern OS allocates free memory to new disk buffers until it runs out of memory and then reuses the least used buffers. Thus, memory will always be full. Furthermore, there are several ways in which two processes may report the same page of memory as part of its size. The upshot of this is that a) a modern OS is always out of memory, and b) it is difficult to tell exactly how memory is used. The best simple indication is to strive for buffer and cached numbers as a large fraction of physical memory. On this box more than 30% of memory is in cached disk data.
Solution 3
I had this same problem. mytop showed lots of queries in the queue. I added indexes to my tables and the problem went away.
Solution 4
Any command not in state S (sleep) will be counted as an active process. This includes those in R running state, and D blocking state. (The latter usually occurring when it's waiting from IO from a disk or network device) You may also have Zombie processes hanging around running up the load average.
To find a list of those specifically, try the following command: ps -efl | cut -c3- | egrep -v "^S"
You don't have a lot of iowait time listed, so it might turn out to be zombies.
The 100% CPU usage from mysqld might also explain your intermittent hangups. (Maybe it only 'sometimes' gets pegged?) The load average might be a red herring, or not the root cause of your problem.
Also, it appears your machine is using 3.5GB out of 4GB of your RAM. free -m
can give you a bit better view of what's getting used.
Related videos on Youtube
lima
Updated on September 18, 2022Comments
-
lima over 1 year
As far as I can tell, the load average on my server (Ubuntu Linux 8.04.1) is way too high, and in practice I see it slows down or stops serving during peak hours.
It's a fairly stock LAMP powering a single site (image hosting) that obviously servers a lot of content (images) from disk, but they need to go through PHP to be served. Aside from general advice to use a cache/proxy approach for this, I'm lost at why it's apparently using less than half of the available resources (4GB RAM, it's a Linode 4096).
I'm quite a noob at Linux, so please ask for whatever might be useful. This is a portion of
htop
(MySQL shows 98.9% CPU usage but that was marginal, it uses 0.*% almost all the time):1 [||||||||||||||||||||||||||||||||||| 69.0%] Tasks: 355 total, 6 running 2 [||||||||||||||||||||||| 44.8%] Load average: 18.32 15.02 11.58 3 [|||||||||||||||||||||||||||||||||||| 71.9%] Uptime: 01:10:22 4 [||||||||||||||||||||||||||||| 57.9%] Mem[||||||||||||||||||||||||||||||||||||||2190/4096MB] Swp[| 0/127MB] PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 2345 mysql 18 0 177M 72640 5140 S 98.9 1.7 7:47.58 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql 9350 www-data 16 0 48940 24304 4376 R 13.7 0.6 0:01.05 /usr/sbin/apache2 -k start 9301 mysql 15 0 177M 72640 5140 S 10.0 1.7 0:00.17 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql 9186 mysql 17 0 177M 72640 5140 S 10.0 1.7 0:00.22 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql 9150 www-data 15 0 58400 33900 4476 S 8.1 0.8 0:02.03 /usr/sbin/apache2 -k start 9077 mysql 15 0 177M 72640 5140 S 8.1 1.7 0:00.39 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql 9270 mysql 15 0 177M 72640 5140 S 7.5 1.7 0:00.12 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql 9037 mysql 16 0 177M 72640 5140 S 7.5 1.7 0:00.45 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql 9333 www-data 15 0 35724 11260 4560 S 6.2 0.3 0:03.88 /usr/sbin/apache2 -k start
This is the current
apache2.conf
, though I've tried lots of combinations and asked here in the past:Timeout 90 KeepAlive On MaxKeepAliveRequests 150 KeepAliveTimeout 3 <IfModule mpm_prefork_module> StartServers 1 MinSpareServers 1 MaxSpareServers 5 MaxClients 275 ServerLimit 275 MaxRequestsPerChild 1250 </IfModule>
UPDATE: As asked, this is a portion of
top
:top - 15:07:31 up 1:46, 2 users, load average: 12.83, 10.64, 10.14 Tasks: 223 total, 17 running, 206 sleeping, 0 stopped, 0 zombie Cpu(s): 84.3%us, 8.8%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 1.0%si, 5.9%st Mem: 4194528k total, 3555696k used, 638832k free, 27748k buffers Swap: 131064k total, 588k used, 130476k free, 1458672k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2345 mysql 17 0 180m 76m 5140 S 55 1.9 13:09.79 mysqld 12479 www-data 18 0 73224 47m 4552 S 48 1.2 0:03.74 apache2 12294 www-data 17 0 71788 46m 4472 R 39 1.1 0:05.78 apache2 12382 www-data 17 0 73744 48m 4460 R 33 1.2 0:03.19 apache2
UPDATE: As suggested (by Christopher Karel, thanks), here are the active processes (output from
ps -efl | cut -c3- | egrep -v "^S"
). On average, it shows 1-5apache2
processes. Does this make sense given my currentapache2.conf
and load average?T root 12519 12508 0 75 0 - 612 finish 15:07 pts/1 00:00:00 top R www-data 18677 2774 1 76 0 - 17130 - 16:23 ? 00:00:04 /usr/sbin/apache2 -k start R www-data 18965 2774 2 76 0 - 13397 - 16:26 ? 00:00:04 /usr/sbin/apache2 -k start R www-data 19047 2774 2 76 0 - 11613 - 16:28 ? 00:00:00 /usr/sbin/apache2 -k start R www-data 19088 2774 55 76 0 - 10482 - 16:29 ? 00:00:00 /usr/sbin/apache2 -k start R www-data 19091 2774 0 81 0 - 8579 - 16:29 ? 00:00:00 /usr/sbin/apache2 -k start R www-data 19092 2774 0 81 0 - 8355 - 16:29 ? 00:00:00 /usr/sbin/apache2 -k start R www-data 19093 2774 0 82 0 - 8322 - 16:29 ? 00:00:00 /usr/sbin/apache2 -k start R root 19094 18557 0 77 0 - 593 - 16:29 pts/2 00:00:00 ps -efl R root 19095 18557 0 78 0 - 729 - 16:29 pts/2 00:00:00 -bash R root 19096 18557 0 78 0 - 729 - 16:29 pts/2 00:00:00 -bash
-
EEAA almost 13 yearsRun
top
and post thewa%
value from the top summary area. -
HTTP500 almost 13 yearsWhat is your IOWAIT? Check %wa in top or %iowait in iostat Cheers
-
HTTP500 almost 13 yearsWhat does "but they need to go through PHP to be served" mean? Are you processing the images before serving?
-
lima almost 13 years@jasondbecker Yes, some need some processing (resizing, watermarking, etc) and some just need routing (validating the URL against the database, since I'm not using the real path to the image)
-
HTTP500 almost 13 years@fandelost, Well, it looks like the resizing is computationally expensive perhaps. How many vCPUs do you have? cat /proc/cpuinfo
-
lima almost 13 years@jasondbecker 4 x Intel(R) Xeon(R) CPU 5130 @ 2.00GHz. Actually, I was aware that dynamically resizing might bring trouble so I was avoiding it, but the load problem goes way back. The resizing thing was rolled out today, before that it only did the routing (MySQL queries are optimized for that). Watermarking is done only once, when the image is uploaded, and kept in a separate dir (sorry for mixing that in the other comment).
-
-
lima almost 13 yearsThank you! As you can see, I've 0 zombie processes (you can see that in the top output) and most of the processes actually are in state S (I was suspicious this has something to do with the problem). As I've said, MySQL processes don't consume more than 1% CPU most (almost all) of the time, that was just an odd capture from htop. Now that you mention it, free -m also indicates 3.78GB used, which is odd since both htop and Virtualmin say 1.85GB being used, what's with this? Thanks again!
-
Christopher Karel almost 13 yearsOoops, right on the Zombie count. But you should still fire off that ps output -- it will essentially show those 10 processes that ARE running. (Not in state S) That's a good first step. As for the htop memory indicators, I expect that's a difference between actively used memory, and linux's memory caching. See something like: serverfault.com/questions/85470/….
-
lima almost 13 yearsInteresting, ps shows just 4 apache2 processes running (plus top and the other commands). Does that make any sense? Even MySQL processes are sleep. I could see this happening if there was no traffic, but that's not the case, specially since the load average is still >10 and the site gets slows every now and then.
-
lima almost 13 yearsThank you, I'll look into atop. I did look the slow queries log, added indexes and optimized some queries not so long ago. The thing that puzzles me most is that, if I'm reaching the server limits, how's that RAM is only half used and most (like 90%) of the processes are (S)leep? I'm not complaining about your conclusion because that was my belief all along, but the numbers doesn't make any sense to me now. No matter what I change in apache2.conf (server/child/process limits), the results never show (in practice).
-
Bittrance almost 13 yearssleeping processes may well be waiting for data from network sockets (e.g. from the user's browser) and so should be asleep. Don't know how htop calculates its mem value (sum up RSS perhaps?), but I would not trust it. top seems to give you more reasonable memory numbers.
-
sciurus almost 13 yearsSeconding atop; it will give you a more comprehensive view of the system than htop. See lwn.net/Articles/387202 for an overview,