How is the "load average" interpreted in "top" output? Is it the same for all distributions?
Solution 1
The CPU load is the length of the run queue, i.e. the length of the queue of processes waiting to be run.
The uptime
command may be used to see the average length of the run queue over the last minute, the last five minutes, and the last 15 minutes, just like what's usually displayed by top
.
A high load value means the run queue is long. A low value means that it is short. So, if the one minute load average is 0.05, it means that on average during that minute, there was 0.05 processes waiting to run in the run queue. It is not a percentage. This is, AFAIK, the same on all Unices (although some Unices may not count processes waiting for I/O, which I think Linux does; OpenBSD, for a while only, also counted kernel threads, so that the load was always 1 or more).
The Linux top
utility gets the load values from the kernel, which writes them to /proc/loadavg
. Looking at the sources for procps-3.2.8
, we see that:
- To display the load averages, the
sprint_uptime()
function is called intop.c
. - This function lives in
proc/whattime.c
and callsloadavg()
inproc/sysinfo.c
. - That function simply opens
LOADAVG_FILE
to read the load averages. -
LOADAVG_FILE
is defined earlier as"/proc/loadavg"
.
Solution 2
The load average is typically calculated by the kernel. Applications such as top
and uptime
may use the getloadavg(3)
library call to access this (it's meant to be portable across different Unix versions). On Linux this typically results in a read from /proc/loadavg
. On FreeBSD it's a system call.
For example:
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
int main()
{
double ld[3];
int i=getloadavg(ld,3);
printf("Load %.2f %.2f %.2f\n",ld[0],ld[1],ld[2]);
}
uptime
and top
both make similar calls to get their data.
Now the 1/5/15 minute load averages are the number of processes on the run queue. Different operating systems may calculate this in different ways; the biggest difference normally is whether processes waiting for I/O (eg blocked on disk) count as runnable or not. On Linux they are.
So a load average of 3.4 means there was an average of 3.4 processes on the run queue within the sample window (1, 5, 15 minutes).
A high load average doesn't necessarily mean an overloaded server, though. If you have 16 cores then your load average can be 16 without stress. You could also have a lot of application fork()
calls which can result in a large number of processes being created/destroyed, leading to a high load average, but without massively impacting server performance. It should only be used as a guide, along with other metrics such as %CPU busy.
Solution 3
The load average is not something that is specific to any particular tool or distribution, it's a measurement provided by the Kernel, or more precisely, the scheduler, therefore it's a distribution independent measurement. The measurement is recorded inside the proc filesystem /proc
Onto it's interpretation,the load average metric is not an indication of how hard the CPU is working but how much work needs to be done. I don't think there is really a need to multiply it by anything because it's a direct measurement of the number of processes in either a runnable or uninterruptible state.
Try checking out the following two man pages: getloadavg(3)
and uptime
for more information.
The load average metric can be a difficult concept to understand at first, I think a lot of people think it's an indication of how hard the CPU is working, but that's not really it.
Related videos on Youtube
Angelos Asonitis
Updated on September 18, 2022Comments
-
Angelos Asonitis almost 2 years
I would like to know if the output of a Red-Hat based linux could be differently interpreted by a Debian based linux.
To make the question even more specific, what I am after, is understanding how the "load average" from the first line of thetop
command on a Red-Hat system is interpreted and how to verify this by official documentation ro code.
[There are many ways to approach this subject, all of which are acceptable answers to the question]
One potential approach, would be to find where this information is officially documented.
Another one, would be to find the code version thattop
is built from in the specific distribution and version I am working on.
The command output I am getting is:top - 13:08:34 up 1:19, 2 users, load average: 0.02, 0.00, 0.00 Tasks: 183 total, 1 running, 182 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 96.8%id, 2.7%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3922520k total, 788956k used, 3133564k free, 120720k buffers Swap: 2097148k total, 0k used, 2097148k free, 344216k cached
In this case how can I interpret the load average value?
I have managed to locate that the average load is about the last minute, from one documentation source and that it should be interpreted after being multiplied with 100, by another documentation source.
So, the question is:Is it 0.02% or 2% loaded?
Documentation sources and versions:
-
The first one stars with
TOP(1) Linux User’s Manual TOP(1) NAME top - display Linux tasks
Source:
man top
in my RedHat distribution
Ubuntu also has the version with "tasks" that does not explain the load average in:
http://manpages.ubuntu.com/manpages/precise/man1/top.1.html -
The second one starts with
TOP(1) User Commands TOP(1) NAME top top - display Linux processes
-
This one starts with:
TOP(1) NAME top - display and update information about the top cpu processes
Source: http://www.unixtop.org/man.shtml
The first one, can be seen by
man top
inRHEL
or inonline ubuntu documentation
and it does not have any explanation for the output format (nor about the load average in which I am interested in).
The second one, contains a brief explanation, pointing out that the load average has to do with the last 1 minute, but nothing about the interpretation of its value!
I quote directly from the second source:
2a. UPTIME and LOAD Averages
This portion consists of a single line containing:
program or window name, depending on display mode
current time and length of time since last boot
total number of users
system load avg over the last 1, 5 and 15 minutesSo, if this explanation is indeed correct, it is just enough to understand that the load average is about the last 1 minute.
But it does not explain the format of the number.
In the third explanation, it says that:
When specifying numbers for load averages, they should be multiplied by 100.
This explanation suggests that 0.02 means 2% and not 0.02%. But is this correct? Additionally, is it correct for all distributions of linux and potentially different implementations of
top
?
To find the answer to this question, I tried to go through the code by searching it online. But I found, at least, two different version oftop
related to RHEL out there! thebuiltin-top.c
and the refactoredtop.c
. Both copyrighted by Red-Hat as the notice says in the beginning of the code and thus seems logical that RHEL uses one of these.
http://lxr.free-electrons.com/source/tools/perf/builtin-top.c
http://lxr.free-electrons.com/source/tools/perf/util/top.c
So, before delving into that much code, I wanted an opinion about where to focus to form an accurate understanding on how cpu load is interpreted?
From information given in the answers below, in addition to some personal search, I have found that:- The
top
that I am using is contained in the package procps-3.2.8. Which can be verified by usingtop -v
.
- In the version of
procps-3.2.8
that I have downloaded from the official website it seems that the tooluptime
get its information from theprocfs
file/proc/loadavg
directly (not utilizing the linux functiongetloadavg()
).
- Now for the
top
command it also does not use the functiongetloadavg()
. I managed to verify that thetop
does indeed the same things as theuptime
tool to show the load averages. It actually calls theuptime
tool's function, which gets its information from theprocfs
file/proc/loadavg
.
So, everything points to the/proc/loadavg
file! Thus, to form an accurate understanding of theload average
produced bytop
, one must read the kernel code to see how the fileloadavg
is written.
There is also an excellent article pointed out in one of the answers that provides a layman's terms explanation of the three values of
loadavg
.
So, despite the fact that all answers have been equally useful and helpful, I am going to mark the one that pointed to the article http://www.linuxjournal.com//article/9001 as "the" answer to my question.Thank you all for your contribution!
Additionally from the question Understanding top and load average, I have found a link to the source code of the kernel that points to the spot where
loadavg
is calculated. As it seems there is a huge comment explaining the way it works, also this part of the code is inC
!
The link to the code is http://lxr.free-electrons.com/source/kernel/sched/loadavg.c
Again I am not trying to engage in any form of plagiarism, I am just adding this for completeness. So, I am repeating that the link to the kernel code was found from one of the answers in Understanding top and load average.-
FelixJN almost 8 yearsWhat do the versions tell you? (
top -v
) -
Angelos Asonitis almost 8 yearstop: procps version 3.2.8
-
BowlOfRed almost 8 years
-
-
Kusalananda almost 8 years@AgelosAssonitis This LinuxJournal article from 2006 may be interesting. It points to a couple of source files in the Linux Kernel source tree that you may want to look at. I'm no Linux user, so I can't tell if those files are still there in the latest incarnation of the kernel: linuxjournal.com/article/9001
-
Angelos Asonitis almost 8 yearsOk, but are we absolutely certain that the load average described in
getloadavg()
function description, is the same load average that thetop
command in rhel shows? I am asking this because I performed a full text search in the un-tarred contents of the package procps-3.2.8 (which is indicated by the commandtop -v
) and there is not a single mention of the function getloadavg()! So, perhaps top calculates the load average in a different manner.... -
Angelos Asonitis almost 8 yearsInteresting article indeed! I am just looking into the procps package to see if indeed the top command takes its value from the /proc/loadavg file and if this is the same as the result of the getloadavg() function.
-
Kusalananda almost 8 years@AgelosAssonitis The definite place for the documentation of load averages is your kernel source tree. POSIX does not define what the CPU load is, and doesn't mention the words "load average" anywhere. The
top
anduptime
utilities are not POSIX utilities and thegetloadavg()
library function is also not defined in POSIX (it has a BSD lineage though). -
Angelos Asonitis almost 8 yearsSo, from what I gather there is no official documentation to understand the forming of these values other than the kernel source itself, is it correct? Before getting there,though, I still need to tie the value presented in
top
to the procfs file loadavg... -
Kusalananda almost 8 years@AgelosAssonitis This is Linux-specific.
top
on an OpenBSD machine gets the load average through asysctl()
call to the kernel, for example. -
Angelos Asonitis almost 8 years@Kusalanda Ok, thanks for this clarification too (mostly for the record) because I am interested in RHEL but it does make the topic more complete. BTW, at the same time I was going through the same functions... lol!
-
leonbloy almost 8 yearsI wonder if it's correct to call that a "CPU load". In Linux, at least, the queue measures the process waiting not only for CPU but also (and in practice mostly) for I/O howtogeek.com/194642/…