Tomcat process killed by Linux kernel after running out of swap space; don't get any JVM OutOfMemory error

10,775

Solution 1

Why would this issue happen? When JVM runs out of memory why is there no OutOfMemoryException thrown?

It is not the JVM that has run out of memory. It is the Host Operating System that has run out of memory-related resources, and is taking drastic action. The OS has no way of knowing that the process (in this case the JVM) is capable of shutting down in an orderly fashion when told "No" in response to a request for more memory. It HAS to hard-kill something or else there is a serious risk of the entire OS hanging.

Anyway, the reason you are not seeing OOMEs is that this is not an OOME situation. In reality, the JVM has already been given too much memory by the OS, and there is no way to take it back. That's the problem the OS has to deal with by hard-killing processes.

And why does it go straight to using swap?

It uses swap because the total virtual memory demand of the entire system won't fit in physical memory. This is NORMAL behaviour for a UNIX / Linux operating system.

Why top RES shows that java is using 5.3G memory, there's much more memory consumed

The RES numbers can be a little misleading. What they refer to is the amount of physical memory that the process is currently using ... excluding stuff that is shared or shareable with other processes. The VIRT number is more relevant to your problem. It says your JVM is using 10.4g of virtual ... which is more than the available physical memory on your system.


As the other answer says, it is concerning that it concerns you that you don't get an OOME. Even if you did get one, it would be unwise to do anything with it. An OOME is liable to do collateral damage to your application / container that is hard to detect and harder to recover from. That's why OOME is an Error not an Exception.


Recommendations:

  • Don't try to use significantly more virtual memory than you have physical memory, especially with Java. When a JVM is running a full garbage collection, it will touch most of its VM pages, multiple times in random order. If you have over-allocated your memory significantly this is liable to cause thrashing which kills performance for the entire system.

  • Do increase your system's swap space. (But that might not help ...)

  • Don't try to recover from OOMEs.

Solution 2

You probably have other processes on the same computer that also use memory. It looks like your java process reaches around 5.3GB before the machine is desperately out of RAM and swap. (Other processes are then probably using 12GB-5.3GB = 6.7GB) So your linux kernel sacrifices your java process to keep other processes running. The java memory limit is never reached so you're not getting an OutOfMemoryException.

Consider all the processes you need running on the entire machine, and adjust your Xmx setting accordingly (enough to leave room for all the other processes). Perhaps 5gb?

In any case, counting of OutOfMemoryExceptions being delivered is a pretty bad code smell. If I recall correctly, getting even a single OutOfMemoryException can leave the JVM in an "all-bets-are-off" state and should probably be restarted to not become unstable.

Share:
10,775

Related videos on Youtube

baggiowen
Author by

baggiowen

Updated on June 25, 2022

Comments

  • baggiowen
    baggiowen almost 2 years

    I was performing load testing against a tomcat server. The server has 10G physical memory and 2G swap space. The heap size (xms and xmx) was set to 3G before, and the server just worked fine. Since I still saw a lot free memory left and the performance was not good, I increased heap size to 7G and ran the load testing again. This time I observed physical memory was eaten up very quickly, and the system started consuming swap space. Later, tomcat crashed after running out of swap space. I included -XX:+HeapDumpOnOutOfMemoryError when starting tomcat, but I didn't get any heap dump. When I checked /var/log/messages, I saw kernel: Out of memory: Kill process 2259 (java) score 634 or sacrifice child.

    To provide more info, here's what I saw from Linux top command when heap size set to 3G and 7G

    xms&xmx = 3G (which worked fine):

    • Before starting tomcat:

      Mem:  10129972k total,  1135388k used,  8994584k free,    19832k buffers
      Swap:  2097144k total,        0k used,  2097144k free,    56008k cached
      
    • After starting tomcat:

      Mem:  10129972k total,  3468208k used,  6661764k free,    21528k buffers
      Swap:  2097144k total,        0k used,  2097144k free,   143428k cached
      PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
      2257 tomcat    20   0 5991m 1.9g  19m S 352.9 19.2   3:09.64 java
      
    • After starting load for 10 min:

      Mem:  10129972k total,  6354756k used,  3775216k free,    21960k buffers
      Swap:  2097144k total,        0k used,  2097144k free,   144016k cached
      PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
      2257 tomcat    20   0 6549m 3.3g  10m S 332.1 34.6  16:46.87 java
      

    xms&xmx = 7G (which caused tomcat crash):

    • Before starting tomcat:

      Mem:  10129972k total,  1270348k used,  8859624k free,    98504k buffers
      Swap:  2097144k total,        0k used,  2097144k free,    74656k cached
      
    • After starting tomcat:

      Mem:  10129972k total,  6415932k used,  3714040k free,    98816k buffers
      Swap:  2097144k total,        0k used,  2097144k free,   144008k cached
      PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
      2310 tomcat    20   0  9.9g 3.5g  10m S  0.3 36.1   3:01.66 java
      
    • After starting load for 10 min (right before tomcat was killed):

      Mem:  10129972k total,  9960256k used,   169716k free,      164k buffers
      Swap:  2097144k total,  2095056k used,     2088k free,     3284k cached
      PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
      2310 tomcat    20   0 10.4g 5.3g  776 S  9.8 54.6  14:42.56 java
      

    Java and JVM Version:

    Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
    

    Tomcat Version:

    6.0.36
    

    Linux Server:

    Red Hat Enterprise Linux Server release 6.4 (Santiago)
    

    So my questions are:

    1. Why would this issue happen? When JVM runs out of memory why is there no OutOfMemoryError thrown? And why does it go straight to using swap?
    2. Why top RES shows that java is using 5.3G memory, there's much more memory consumed?

    I have been investigating and searching for a while, still cannot find the root cause for this issue. Thanks a lot!

    • AngerClown
      AngerClown
      Better question is why is Tomcat using so much memory? You can still get a heap dump by sending the process SIGQUIT (kill -3) or with jmap. Eclipse MAT is probably the easiest way to analyze the dump if most of the memory is all coming from one place.
  • baggiowen
    baggiowen almost 11 years
    thanks for you reply! but when I did top -a (sort by memory usage), I didn't see any other process consuming a lot memory. And if you look at the memory usage before I started tomcat, there's only about 1G memory been used.
  • brady
    brady almost 11 years
    This is pretty good advice, but the "OutOfMemoryException can leave the JVM in an 'all-bets-are-off' state" part is incorrect. First, it's OutOfMemoryError, but more importantly, this an orderly process that doesn't create inherent instability. The problem is that your program isn't likely to do anything useful without more memory. But it isn't damaged or unstable in any way.
  • faffaffaff
    faffaffaff almost 11 years
    The biggest problem with OutOfMemoryError is that it can happen at runtime deep within some java.* framework class or another third-party library, which perhaps doesn't have an exception handler ready to do cleanup, for example. At least that's the explanation I got many years ago while I was battling some unstability in third party libraries triggered by such errors.
  • baggiowen
    baggiowen almost 11 years
    thank you! that makes sense. but what I still don't understand is why there's so much difference when heap size set to 3G and 7G. by looking at the memory usage before starting tomcat, I thought OS should be capable of handling 7G heap.
  • Stephen C
    Stephen C almost 11 years
    Your JVM is actually using 10.4G. Maybe you've got a lot of off-heap memory usage going on under the covers. Note also there is a similar ~3.5G difference between the requested heap size and the observed VIRT size in the case where you used the smaller heap.
  • Stephen C
    Stephen C almost 11 years
    @faffaffaff - Or the OOME might happen on some worker thread ... which then dies, leaving other parts of the application in limbo waiting for notifications etc that will never arrive.
  • faffaffaff
    faffaffaff almost 11 years
    @StephenC excellent point, come to think of it that was a big part of the problem way back when: Worker threads that exit because nothing catches the unexpected OOME, and things start piling up or getting stuck.
  • baggiowen
    baggiowen almost 11 years
    thanks again. I just realized they both have 3.5G difference. So is there a way that I can find out those off-heap memory usage under the covers? Also, I was reading another post, which indicates that RSS more relevant than VIRT. I'm confused now... stackoverflow.com/questions/561245/…
  • Stephen C
    Stephen C over 10 years
    @baggiowen - the relevance of RES and VIRT depends on the question you are asking / problem you are trying to solve. For this purpose VIRT is more relevant, but USED is the best measure. But either way, if you are going to draw accurate conclusions from the stats, you need to understand how Linux virtual memory works, and what those numbers actually mean. For the latter, read "man top" ... for example.