How can average run queue length be 1 but load average is almost zero

linux fedora load sar

8,313

Solution 1

This is just a guess but the effect is easily explained if the run queue length is not an average value (and why should it be if there are already three average values?) but a point in time. The one entry sar sees on the run queue is always sar itself. Unless you add a process in which case there are two then.

Solution 2

I have concluded that SAR runq_sz is an instantaneous snapshot and not an average because

a) I've compared top vs sar per-second compared to top load average on a low-loaded server and when I average the 60 sar per-second values in a minute it's way more than top's load average, but more simply...

b) It's always an integer. If it was an average (even per-second) it would be fractional like the load averages or cpu

8,313

user3665821

Updated on September 18, 2022

Comments

user3665821 almost 2 years

(I originally posted this on Stack Overflow - advised to move it to here).

On fedora 17 when i run the sar command to view the load average activity it almost always shows a run queue length of 1 even though the system is idle and the load averages are practically zero. My understanding of run queue length and how it relates to linux load average would suggest that if the run queue length is truly 1 on average over a period of time then for my quad core system the load average should be ~25% load average i.e. in my case read around the 1.00 mark:

$ sar -q 30 60
Linux 3.9.10-100.fc17.i686 (blah)   22/05/14    _i686_  (4 CPU)

16:29:10      runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
16:29:40            1       547      0.02      0.07      0.57         0
16:30:10            1       548      0.09      0.08      0.56         0
16:30:40            1       547      0.05      0.07      0.54         0
16:31:10            1       547      0.03      0.06      0.52         0
16:31:40            0       547      0.02      0.06      0.51         0
16:32:10            1       547      0.01      0.05      0.49         0
16:32:40            1       547      0.13      0.08      0.49         0
16:33:10            1       547      0.08      0.07      0.47         0
16:33:40            1       547      0.05      0.07      0.46         0

If i then poll often for runnable tasks i very occasionally see the odd process pop up (i do this with ps r -A | grep -v 'ps r -A'). I would have expected to see one process pop up regularly to be in line with the sar output.

Then if i kick of a single threaded process that consumes as much CPU as it can the run queue size immediately jumps to 2 (which is to be expected in the circumstances) but strangely after a while the run queue drops back to 1 again?

Linux 3.9.10-100.fc17.i686 (blah)   22/05/14    _i686_  (4 CPU)

16:32:40      runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
16:33:10            1       547      0.08      0.07      0.47         0
16:33:40            1       547      0.05      0.07      0.46         0

START SCRIPT

16:34:10            2       548      0.11      0.08      0.45         0
16:34:40            2       548      0.51      0.18      0.47         0
16:35:10            2       548      0.70      0.26      0.49         0
16:35:40            2       548      0.82      0.33      0.50         0
16:36:10            2       548      0.89      0.39      0.52         0
16:36:40            2       548      0.93      0.45      0.53         0
16:37:10            2       548      0.96      0.50      0.55         0
16:37:40            2       548      1.04      0.57      0.57         0
16:38:10            2       548      1.02      0.61      0.58         0
16:38:40            2       548      1.01      0.64      0.60         0
16:39:10            2       548      1.09      0.70      0.61         0
16:39:40            2       548      1.05      0.72      0.63         0
16:40:10            3       550      1.11      0.77      0.64         0
16:40:40            3       549      1.06      0.79      0.65         0
16:41:10            3       549      1.04      0.81      0.67         0
16:41:40            3       549      1.02      0.83      0.68         0
16:42:10            2       549      1.01      0.84      0.69         0
16:42:40            2       549      1.01      0.86      0.70         0
16:43:10            1       549      1.07      0.89      0.71         0
16:43:40            1       549      1.04      0.90      0.72         0
16:44:10            1       549      1.03      0.91      0.73         0
16:44:40            1       549      1.02      0.92      0.74         0
16:45:10            1       548      1.01      0.93      0.75         0
16:45:40            1       548      1.01      0.93      0.75         0
16:46:10            1       548      1.00      0.94      0.76         0
16:46:40            1       548      1.00      0.94      0.77         0
16:47:10            1       548      1.00      0.95      0.78         0
16:47:40            1       548      1.00      0.96      0.78         0
16:48:10            1       548      1.00      0.96      0.79         0

Can anyone explain what is going on here? The only explanation i can think of is that there is some special system task that can utilize the CPU if nothing else is but:

Is not included in the load average calculations and
Will give up its CPU time if a process does come along that needs it

When the sar command samples the run queue it sees itself, but then this would not explain why the run queue eventually remains at one with the cpu load script running?

I misunderstand the load average / run queue concept.

Any suggestions much appreciated!

UPDATE: So i tried again on a different machine with the same version of fedora and sar etc. This time i am seeing a consistent run queue length of 0 when the system is idle. Also tried on a centos 5.7 machine, again, run queue length is consistently 0 when idle.

So presumably sar doesn't instantaneously see itself in the run queue. Still at a loss to explain why this machine reports ~0 for load average yet consistently measures 1 for run queue length.

Anthon about 10 years

It was not moved here, it was cross posted here
derobert about 10 years

Please delete your question on Stack Overflow.
derobert about 10 years

@Anthon I think it makes more sense to close the one on SO; it's off-topic there.
Anthon about 10 years

@derobert The OP should have deleted the question there, closing over there is IMHO not enough. moving implies the original location being removed.
user3665821 about 10 years

Hi all. I have just deleted the version on SO. Apologies, i would have done this from the very start but i didn't see anything obvious on the help notes in SO to suggest i was required to do this.

user3665821 about 10 years

Hi, thanks for responding! Its true that the sar man page doesn't say the run queue length is average. However, what makes me doubt that it see's itself is why then when the load script runs does it not always see 2 at least. Also, what i see is contrary to the experiments i have seen where others are doing a similar thing e.g. prutser.wordpress.com/2012/04/23/…