Log from oom killer
I'm not sure which log you're actually referring to but there is a logging system that's often included with most distros called sar
, it's typicaly in a package called sysstat
.
Also I wrote up this U&L Q&A that covers a variety of methods for logging performance information, titled: " Commands for determining level of usage of server"
Additional sar references
- Easy system monitoring with SAR
- System Activity Reporter (sar)
- sysstat home page
- Wikipedia page on sar
Related videos on Youtube
ygram
Developer of the sc2 ranking site www.rankedftw.com. I also work, but that's another story.
Updated on September 18, 2022Comments
-
ygram over 1 year
I'm running a virtual machine at a provider where you pay for everything you use (CPU, memory, disk IO, bandwidth, etc). Currently my machine has 512MB ram and 1GB swap area.
The kernel version is 3.8.0-21 and the distribution is Ubuntu 13.04 but I suspect the kernel may be custom made, because kernel updates are held back.
I run a few cron jobs where Python processes are doing things with a PostgreSQL database. Lately the Python processes have been killed by the OOM killer.
I don't need help in solving the actual problem, I can increase the memory, reschedule the cron jobs, turn off overcommitting of memory, tune the PostgreŚQL settings, change how the Python programs work, but first I would like to know exactly why this happens (to make the correct fix). The thing is that I have plenty of swap free at the time of the kill but no physical memory.
I have the following in the kernel log:
[4003981.650341] select 1 (init), adj 0, size 249, to kill [4003981.650349] select 609 (rsyslogd), adj 0, size 609, to kill [4003981.650359] select 17139 (postgres), adj 0, size 635, to kill [4003981.650361] select 10381 (postgres), adj 0, size 6719, to kill [4003981.650365] select 14153 (postgres), adj 0, size 7296, to kill [4003981.650367] select 14159 (postgres), adj 0, size 7300, to kill [4003981.650370] select 26802 (python3), adj 0, size 70767, to kill [4003981.650372] send sigkill to 26802 (python3), adj 0, size 70767
I run
wmstat
every second around the same time (the Python process was killed right before 12:13:48):2014-02-01 12:13:43 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- 2014-02-01 12:13:43 r b swpd free inact active si so bi bo in cs us sy id wa 2014-02-01 12:13:43 1 0 55 5 216 217 0 0 964 396 386 514 71 10 0 18 2014-02-01 12:13:44 1 1 55 87 185 166 0 0 22304 14536 907 1241 53 9 0 38 2014-02-01 12:13:45 1 0 55 60 190 189 0 0 21768 17344 1216 4581 21 26 0 53 2014-02-01 12:13:46 1 1 57 6 218 214 0 0 22264 4836 1031 3696 22 43 0 35 2014-02-01 12:13:47 2 1 59 4 217 218 0 0 28228 29892 1045 6234 22 34 0 44 2014-02-01 12:13:48 1 2 73 272 97 74 0 0 39436 14372 975 3708 10 38 0 52 2014-02-01 12:13:49 1 0 73 185 173 85 0 0 78400 356 1154 1943 23 33 0 44 2014-02-01 12:13:51 1 1 73 247 132 65 0 0 1936 0 165 188 7 13 69 11
Now to the actual question: I have seen other questions here (like this), where there is a log with more details about the memory (DMA, Normal memory, etc). I can't seem to find it anywhere on my system. It is something that needs to be turned on or where can I find it?
The information I'm looking for looks something like this:
Free pages: 6524kB (0kB HighMem) Active:20 inactive:23 dirty:0 writeback:0 unstable:0 free:1631 slab:874 mapped:20 pagetables:90 DMA free:6524kB min:1200kB low:1500kB high:1800kB active:80kB inactive:92kB present:16384kB pages_scanned:41 all_unreclaimable? no lowmem_reserve[]: 0 0 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 DMA: 411*4kB 252*8kB 113*16kB 27*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6524kB Normal: empty HighMem: empty Swap cache: add 24282, delete 24269, find 7705/11790, race 0+0 Free swap = 124768kB Total swap = 128448kB Out of Memory: Killed process 453 (smbd).
-
Mark Plotnick over 10 yearsIn every x86 linux system I've used, the oom-killer logs DMA and Normal memory counts etc. to the kernel log. In the ubuntu 13.10 linux kernel source I looked`at, the strings in your sample output above, "select.*size.*to kill", occur only in the android driver, in lowmemorykiller.c. They don't occur at all in mm/oom_kill.c. Could you tell us a bit more about the architecture and hypervisor of the virtual machine you're running in?
-
ygram over 10 years@MarkPlotnick I do not know what architecture and hypervisor they use. I don't know how to see that from within the guest instance, maybe I could email their support and ask for it.
-
kw1c-MuurmansLorentz over 4 yearsNow dstat provides the feature to find out in your running system which process is candidate for getting killed by oom mechanism dstat --top-oom --out-of-memory--- kill score java 77 java 77 java 77 stackoverflow.com/a/25622401/4268594
-
-
ygram over 10 yearsI added some example of the log I am interested in. I will also look into SAR and see if I can get the information there.
-
slm over 10 years@ygram - look around the OOM project's site: linux-mm.org/OOM