Detecting out of memory errors

10,926

Solution 1

You could use an out of memory warning system; this OutOfMemoryError Warning System can be an inspiration. You could configure a listener which is invoked after a certain memory threshold ( say 80%) is breached - you can use this invocation to start taking corrective measures.

We use something similar, where we suspend the component's service when the memory threshold of the component reaches 80% and start the clean up action; the component comes back only when the used memory comes below a another configurable value threshold.

Solution 2

There is an article based on the post that Scorpion has already given a link to.

The technique is again based on using MemoryPoolMXBean and subscribing to the "memory threshold exceeded" event, but it's slightly different from what was described in original post.

Author states that when you subscribe for the plain "memory threshold exceeded" event, there is a possibility of "false alarm". Imagine a situation when the memory consumption is above the threshold, but there will be a garbage collection performed soon and a lot of the memory is freed after that. In fact that situation is quite common in real world applications.

Fortunately, there is another threshold, "collection usage threshold", and a corresponding event, which is fired based on memory consumption right after garbage collection. When you receive that event, you can be much more confident you're running out of memory.

Share:
10,926
mindas
Author by

mindas

bean in the Java world

Updated on June 06, 2022

Comments

  • mindas
    mindas about 2 years

    I would like to provide my system with a way of detecting whether out of memory exception has occurred or not. The aim for this exercise is to expose this flag through JMX and act correspondingly (e.g. by configuring a relevant alert on the monitoring system), as otherwise these errors sit unnoticed for days.

    Naive approach for this would be to set an uncaught exception handler for every thread and check whether the raised exception is instance of OutOfMemoryError and set a relevant flag. However, this approach isn't realistic for the following reasons:

    • The exception can occur anywhere, including 3rd party libraries. There is nothing I can do to prevent them catching Throwable and keeping it for themselves.
    • Libraries can spawn their own threads and I have no way of enforcing uncaught exception handlers for these threads.

    One of possible scenarios I see is bytecode manipulation (e.g. attaching some sort of aspect on top of OutOfMemoryError), however I am not sure if that's right approach or whether this is doable in general.

    We have -XX:+HeapDumpOnOutOfMemoryError enabled, but I don't see this as a solution for this problem as it was designed for something else - and it provides no Java callback when this happens.

    Has anyone done this? How would you solve it or suggest solving it? Any ideas are welcome.

  • mindas
    mindas almost 12 years
    I have explicitly mentioned this solution is inappropriate - we are running multiple instances of our software on different machines and having a single shared nfs folder for all dumps, as they tend to be quite large (4G+). I'd rather know there is no proper solution than invest my time and build something that is crippled
  • Amir Afghani
    Amir Afghani almost 12 years
    Why would you have a single NFS point for all the dumps? It sounds like you need to rethink your setup. Tell me, what is a proper solution? What could you do, given that you just ran out of memory??
  • mindas
    mindas almost 12 years
    If you think such a setup is bad, it would be nice of you to explain why. But nevertheless I'll explain my point. We don't want a spurious memory dump to overflow other, business-critical, partitions and interrupt business flow. This is pretty much the same argument on why you need to separate /var/log from /home in Unix. On top of that, we are running dozens of virtual machines and maintaining a single mount point is far more convenient than just looking for files among multiple servers.