High load average, when should I be worried?

11,558

Solution 1

Only worry if it actually corresponds to a slow application.

A bit more precisely, load average relates to the number of processes running or waiting. This can be a lot more than 1 and perform just fine. A load average of 21 on a host with 24 cores will have idle CPU, even with those processes running 100%. The advice that 1 is a lot may come from people who have not seen large or busy hosts.

iowait is delay for the application but (in modern storage systems) the CPU is effectively free to do other things.

Monitor your application's response time. Correlate that with your other monitoring to see what actually indicates things are slow.

Solution 2

A load average higher than 1 refers to 1 core/thread. So a rule of thumb is that an average load equal to your cores/threads is OK, more will most likely lead to queued processes and slow down things.

The iowait e.g. is also accounted in the load average and one process which is doing heavy IO can push the load average over 1 without using a second core/thread.
While this heavy IO process will likely have bad response time, a second process can be very responsive a high load. Depending on the resources the process is accessing.

Share:
11,558
AL-Kateb
Author by

AL-Kateb

Updated on September 18, 2022

Comments

  • AL-Kateb
    AL-Kateb almost 2 years

    I have a server which runs a few hundred processes simultaneously, most of them are idle, it is some sort of web crawler it sleeps between requests for various reasons.

    So as a result, my load average is usually something like: 21.64, 27.05, 29.16

    That's very very high right? But everything runs smoothly!

    And my CPU consumption is something like (mpstat 60 1 output):

    11:07:06 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
    11:08:06 AM  all   34.82    0.00    4.16   10.70    0.00    0.31    0.00    0.00    0.00   50.01
    Average:     all   34.82    0.00    4.16   10.70    0.00    0.31    0.00    0.00    0.00   50.01
    

    So, since I'm not even running at 100% CPU usage I feel like I do not have a reason to be worried, or am I missing something? There is a slight delay when nginx is serving requests, but that's expected given the large number of queued requests, But I read somewhere that a load average higher than 1 is a cause for alarm, and I honestly don't see why that is.

    So please advise.

    Thanks

  • AL-Kateb
    AL-Kateb almost 7 years
    What I posted above is the average of all CPUs, I checked and the load is almost the same on all of them, so the system is utilizing all CPUs and cores, plus I know EXACTLY what is causing the high load, as I said the crawler that I'm running, which sleeps a couple of seconds between requests. My guess is the iowait is caused by network io, I have 2 SSD disks with raid 0, my processes do not use the disk that much. I also have 79 GB of free ram, so there's that as well.
  • AL-Kateb
    AL-Kateb almost 7 years
    It has been going on like that for a couple of months now, there is no problem, but what I would like to know is how do I know how much more can I push this server! Because these numbers are not very clear to me, 25 load is considered high, since I have 12 cores, but yet CPU usage is around 30%, does this mean I can still push it, or the high load average should concern me and I should start trying to maybe tune it up. I am running CentOS 7.3 on a server with Intel(R) Xeon(R) CPU E5-1650 v3 and 96 GB of RAM., so there's that as well
  • Tero Kilkanen
    Tero Kilkanen almost 7 years
    The load could be due to your crawler, which is waiting for response from the site it is crawling. It is hard to tell how far you can push it, since it is hard to guess what the behaviour of the overall system is. You need to find the limit yourself via scientific method.