How to understand the memory usage and load average in linux server
Solution 1
(1) I see that each of the running processes occupies a very small percentage of memory (%MEM no more than 0.2%, and most just 0.0%), but how the total memory is almost used as in the fourth line of output ("Mem: 130766620k total, 130161072k used, 605548k free, 919300k buffers")? The sum of used percentage of memory over all processes seems unlikely to achieve almost 100%, doesn't it?
To see how much memory you are currently using, run free -m
. It will provide output like:
total used free shared buffers cached Mem: 2012 1923 88 0 91 515 -/+ buffers/cache: 1316 695 Swap: 3153 256 2896
The top row 'used' (1923) value will almost always nearly match the top row mem value (2012). Since Linux likes to use any spare memory to cache disk blocks (515).
The key used figure to look at is the buffers/cache row used value (1316). This is how much space your applications are currently using. For best performance, this number should be less than your total (2012) memory. To prevent out of memory errors, it needs to be less than the total memory (2012) and swap space (3153).
If you wish to quickly see how much memory is free look at the buffers/cache row free value (695). This is the total memory (2012)- the actual used (1316). (2012 - 1316 = 696, not 695, this will just be a rounding issue)
(2) how to understand the load average on the first line ("load average: 14.04, 14.02, 14.00")?
This article on load average uses a nice traffic analogy and is the best one I've found so far: Understanding Linux CPU Load - when should you be worried?. In your case, as people pointed out:
On multi-processor system, the load is relative to the number of processor cores available. The "100% utilization" mark is 1.00 on a single-core system, 2.00, on a dual-core, 4.00 on a quad-core, etc.
So, with a load average of 14.00 and 24 cores, your server is far from being overloaded.
Solution 2
Unix like systems, including linux, are designed to make the most efficient use of the available RAM possible. In very general terms, there are 3 states each MB of RAM can be in:
- Free
- Used by a Process
- Used for Buffers
The 3rd state is only used as scratch space and is intended to be reassigned whenever necessary, i.e. your total available memory for programs is really Free+UsedforBuffers. As such, you won't really see the buffer allocated space showing up as assigned to any specific process.
Your load average question is a little more interesting, as it can easily be misinterpreted. For the full story see this linuxjournal article. The best summation is a direct quote from the article,
The load-average calculation is best thought of as a moving average of processes in Linux's run queue marked running or uninterruptible
Meaning, that you can think of your load average as (# of running processes)+(# of processes waiting on IO). Keeping in mind that at any given time you can have $CORE number of processes being executed, I would say that your load average of 14 is pretty low.
Solution 3
From the sar
man page:
The load average is calculated as the average number of runnable or running tasks (R state), and the number of tasks in uninterruptible sleep (D state) over the specified interval.
From the uptime
man page:
System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU. A process in unin‐ terruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of CPUs in a system, so a load aver‐ age of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.
Solution 4
- Linux, for some time now, has managed its memory in a way that makes that line of
top
basically useless, generally keeping most of the machine's memory allocated for various uses when it's not required by a user process. - The load average is the average number of processes running or waiting to run. It usually has a strong negative correlation with system latency/responsiveness, so you want it as low as possible. Since each of your CPUs can be running something at any given time, though, you seem to be doing pretty well at 14.
Related videos on Youtube
user1698102
Elitists are oppressive, anti-intellectual, ultra-conservative, and cancerous to the society, environment, and humanity. Please help make Stack Exchange a better place. Expose elite supremacy, elitist brutality, and moderation injustice to https://stackoverflow.com/contact (complicit community managers), in comments, to meta, outside Stack Exchange, and by legal actions. Push back and don't let them normalize their behaviors. Changes always happen from the bottom up. Thank you very much! Just a curious self learner. Almost always upvote replies. Thanks for enlightenment! Meanwhile, Corruption and abuses have been rampantly coming from elitists. Supportive comments have been removed and attacks are kept to control the direction of discourse. Outright vicious comments have been removed only to conceal atrocities. Systematic discrimination has been made into policies. Countless users have been harassed, persecuted, and suffocated. Q&A sites are for everyone to learn and grow, not for elitists to indulge abusive oppression, and cover up for each other. https://paste.ubuntu.com/p/K3kPdgGzVd https://paste.ubuntu.com/p/4sTKWKhsKF/ https://paste.ubuntu.com/p/NNm5sNbRgK paste.ubuntu.com/p/Qh6wNZDXR My posts on various stackexchange sites are now under attack by D.W., a cs.stackexchange.com's moderator, who has been removing my posts and suspending my account there and now try his best to suppress me on the network. https://softwareengineering.stackexchange.com/posts/419086/revisions https://math.meta.stackexchange.com/q/32539/ (https://i.stack.imgur.com/4knYh.png) and https://math.meta.stackexchange.com/q/32548/ (https://i.stack.imgur.com/9gaZ2.png) https://meta.stackexchange.com/posts/353417/timeline (The moderators defended continuous harassment comments showing no reading and understanding of my post) https://cs.stackexchange.com/posts/125651/timeline (a PLT academic had trouble with the books I am reading and disparaged my self learning posts, and a moderator with long abusive history added more insults.) https://stackoverflow.com/posts/61679659/revisions (homework libels) Much more that have happened.
Updated on September 17, 2022Comments
-
user1698102 over 1 year
I am using a linux server which has 128GB of memory and 24 cores. I use top to see how much it is used. Its output is pasted at the end of the post. Here are two questions:
(1) I see that each of the running processes occupies a very small percentage of memory (%MEM no more than 0.2%, and most just 0.0%), but how the total memory is almost used as in the fourth line of output ("Mem: 130766620k total, 130161072k used, 605548k free, 919300k buffers")? The sum of used percentage of memory over all processes seems unlikely to achieve almost 100%, doesn't it?
(2) how to understand the load average on the first line ("load average: 14.04, 14.02, 14.00")?
Thanks and regards!
Edit:
Thanks!
I also really like to hear some rough numbers based on used percentage of memory to determine if a server is heavily loaded, since I once became the one who cramed the server without understanding the current load.
Is swap regarded as almost the same as memory? For example, when memory and swap are almost of same size, if the memory is almost running out but the swap is still largely free, may I just view it as if the used percentage of memory + swap is still not high and run other new processes?
How would you consider together CPU or memory (or memory + swap) usage? Do you become worried if either of them reaches too high or both?
Output of top:
$ top
top - 12:45:33 up 19 days, 23:11, 18 users, load average: 14.04, 14.02, 14.00 Tasks: 484 total, 12 running, 472 sleeping, 0 stopped, 0 zombie Cpu(s): 36.7%us, 19.7%sy, 0.0%ni, 43.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 130766620k total, 130161072k used, 605548k free, 919300k buffers Swap: 63111312k total, 500556k used, 62610756k free, 124437752k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6529 sanchez 18 -2 1075m 219m 13m S 100 0.2 13760:23 MATLAB 13210 timothy 18 -2 48336 37m 1216 R 100 0.0 3:56.75 absurdity 13888 timothy 18 -2 48336 37m 1204 R 100 0.0 2:04.89 absurdity 14542 timothy 18 -2 48336 37m 1196 R 100 0.0 1:08.34 absurdity 14544 timothy 18 -2 2888 2076 400 R 100 0.0 1:06.14 gatherData 6183 sanchez 18 -2 1133m 195m 13m S 100 0.2 13676:04 MATLAB 6795 sanchez 18 -2 1079m 210m 13m S 100 0.2 13734:26 MATLAB 10178 timothy 18 -2 48336 37m 1204 R 100 0.0 11:33.93 absurdity 12438 timothy 18 -2 48336 37m 1216 R 100 0.0 5:38.17 absurdity 13661 timothy 18 -2 48336 37m 1216 R 100 0.0 2:44.13 absurdity 14098 timothy 18 -2 48336 37m 1204 R 100 0.0 1:58.31 absurdity 14335 timothy 18 -2 48336 37m 1196 R 100 0.0 1:08.93 absurdity 14765 timothy 18 -2 48336 37m 1196 R 99 0.0 0:32.57 absurdity 13445 timothy 18 -2 48336 37m 1216 R 99 0.0 3:01.37 absurdity 28990 root 20 0 0 0 0 S 2 0.0 65:50.21 pdflush 12141 tim 18 -2 19380 1660 1024 R 1 0.0 0:04.04 top 1240 root 15 -5 0 0 0 S 0 0.0 16:07.11 kjournald 9019 root 20 0 296m 4460 2616 S 0 0.0 82:19.51 kdm_greet 1 root 20 0 4028 728 592 S 0 0.0 0:03.11 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:01.01 migration/0 4 root 15 -5 0 0 0 S 0 0.0 0:08.13 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0 6 root RT -5 0 0 0 S 0 0.0 17:27.31 migration/1 7 root 15 -5 0 0 0 S 0 0.0 0:01.21 ksoftirqd/1 8 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/1 9 root RT -5 0 0 0 S 0 0.0 10:02.56 migration/2 10 root 15 -5 0 0 0 S 0 0.0 0:00.34 ksoftirqd/2 11 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/2 12 root RT -5 0 0 0 S 0 0.0 4:29.53 migration/3 13 root 15 -5 0 0 0 S 0 0.0 0:00.34 ksoftirqd/3
-
Zoredache over 14 yearsSee this for a good answer about memory. serverfault.com/questions/38065/#38074
-
Nickolay over 5 yearsAnd this answer for a quick summary about the load average (or this long-read for the details).
-
Zoredache over 5 years
-
-
user1698102 over 14 yearsThanks! Regarding 1, do you mean that some processes doesn't show up in top but are using lot of memory? Or that the fourth line of output about memory ("Mem: 130766620k total, 130161072k used, 605548k free, 919300k buffers") is misleading, I should look at the sum of percentage of memory used by all processes shown in top and in my case I can safely run some new memory-consuming processes?
-
David Z over 14 yearsAs other answers have pointed out, the load average should be compared with the number of processors, so 14 isn't that much on a 24-core system. It'd be kind of like 14/24=0.58 on a single-core system (well kind of).
-
user1698102 over 14 yearsThanks! What is the used percentage of memory (or memory + swap) regarded as heavy loaded and better not to run new processes? Do you look at both memory or memory + swap? Is the used swap shown in the top the swap size actually used? Regarding CPU load average, do you measure the actual load by "load average / core number"? How much for it would you regard the server is heavily loaded? Thanks and regards!
-
chaos over 14 years@Tim: I mean the latter.
-
Cian over 14 yearsLoad's just an indicator. As a general rule, a load of greater than cores is a bad thing. Generally a high percentage of memory used is a bad thing. It's not a binary value that you can say 'this much is fine'. If you run out of RAM, you don't have enough to run more processes. If you don't run out, you've got plenty. It's very much dependent on the specifics of your situation.