G1 garbage collector: Perm Gen fills up indefinitely until a Full GC is performed

java garbage-collection jboss7.x g1gc

14,939

Solution 1

Causes of growing Perm Gen

Lots of classes, especially JSPs.
Lots of static variables.
There is a classloader leak.

For those that don't know, here is a simple way to think about how the PremGen fills up. The Young Gen doesn't get enough time to let things expire and so they get moved up to Old Gen space. The Perm Gen holds the classes for the objects in the Young and Old Gen. When the objects in the Young or Old Gen get collected and the class is no longer being referenced then it gets 'unloaded' from the Perm Gen. If the Young and Old Gen don't get GC'd then neither does the Perm Gen and once it fills up it needs a Full stop-the-world GC. For more info see Presenting the Permanent Generation.

Switching to CMS

I know you are using G1 but if you do switch to the Concurrent Mark Sweep (CMS) low pause collector -XX:+UseConcMarkSweepGC, try enabling class unloading and permanent generation collections by adding -XX:+CMSClassUnloadingEnabled.

The Hidden Gotcha'

If you are using JBoss, RMI/DGC has the gcInterval set to 1 min. The RMI subsystem forces a full garbage collection once per minute. This in turn forces promotion instead of letting it get collected in the Young Generation.

You should change this to at least 1 hr if not 24 hrs, in order for the the GC to do proper collections.

-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000

List of every JVM option

To see all the options, run this from the cmd line.

java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version

If you want to see what JBoss is using then you need to add the following to your standalone.xml. You will get a list of every JVM option and what it is set to. NOTE: it must be in the JVM that you want to look at to use it. If you run it external you won't see what is happening in the JVM that JBoss is running on.

set "JAVA_OPTS= -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal %JAVA_OPTS%"

There is a shortcut to use when we are only interested in the modified flags.

-XX:+PrintcommandLineFlags

Diagnostics

Use jmap to determine what classes are consuming permanent generation space. Output will show

class loader
# of classes
bytes
parent loader
alive/dead
type

totals

jmap -permstat JBOSS_PID  >& permstat.out

JVM Options

These settings worked for me but depending how your system is set up and what your application is doing will determine if they are right for you.

-XX:SurvivorRatio=8 – Sets survivor space ratio to 1:8, resulting in larger survivor spaces (the smaller the ratio, the larger the space). The SurvivorRatio is the size of the Eden space compared to one survivor space. Larger survivor spaces allow short lived objects a longer time period to die in the young generation.
-XX:TargetSurvivorRatio=90 – Allows 90% of the survivor spaces to be occupied instead of the default 50%, allowing better utilization of the survivor space memory.
-XX:MaxTenuringThreshold=31 – To prevent premature promotion from the young to the old generation . Allows short lived objects a longer time period to die in the young generation (and hence, avoid promotion). A consequence of this setting is that minor GC times can increase due to additional objects to copy. This value and survivor space sizes may need to be adjusted so as to balance overheads of copying between survivor spaces versus tenuring objects that are going to live for a long time. The default settings for CMS are SurvivorRatio=1024 and MaxTenuringThreshold=0 which cause all survivors of a scavenge to be promoted. This can place a lot of pressure on the single concurrent thread collecting the tenured generation. Note: when used with -XX:+UseBiasedLocking, this setting should be 15.
-XX:NewSize=768m – allow specification of the initial young generation sizes
-XX:MaxNewSize=768m – allow specification of the maximum young generation sizes

Here is a more extensive JVM options list.

Solution 2

Is this the expected behaviour with G1?

I don't find it surprising. The base assumption is that stuff put into permgen almost never becomes garbage. So you'd expect that permgen GC would be a "last resort"; i.e. something the JVM would only do if its was forced into a full GC. (OK, this argument is nowhere near a proof ... but its consistent with the following.)

I've seen lots of evidence that other collectors have the same behaviour; e.g.

I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...

I think I found the same post. But someone's opinion that it ought to be possible is not really instructive.

Is there something I can improve/corrrect in our startup parameters?

I doubt it. My understanding is that this is inherent in the permgen GC strategy.

I suggest that you either track down and fix what is using so much permgen in the first place ... or switch to Java 8 in which there isn't a permgen heap anymore: see PermGen elimination in JDK 8

While a permgen leak is one possible explanation, there are others; e.g.

overuse of String.intern(),
application code that is doing a lot of dynamic class generation; e.g. using DynamicProxy,
a huge codebase ... though that wouldn't cause permgen churn as you seem to be observing.

Solution 3

I agree with the answer above in that you should really try to find what is actually filling your permgen, and I'd heavily suspect it's about some classloader leak that you want to find a root cause for.

There's this thread in the JBoss forums that goes through couple of such diagnozed cases and how they were fixed. this answer and this article discusses the issue in general as well. In that article there's a mention of possibly the easiest test you can do:

Symptom

This will happen only if you redeploy your application without restarting the application server. The JBoss 4.0.x series suffered from just such a classloader leak. As a result I could not redeploy our application more than twice before the JVM would run out of PermGen memory and crash.

Solution

To identify such a leak, un-deploy your application and then trigger a full heap dump (make sure to trigger a GC before that). Then check if you can find any of your application objects in the dump. If so, follow their references to their root, and you will find the cause of your classloader leak. In the case of JBoss 4.0 the only solution was to restart for every redeploy.

This is what I'd try first, IF you think that redeployment might be related. This blog post is an earlier one, doing the same thing but discussing the details as well. Based on the posting it might be though that you're not actually redeploying anything, but permgen is just filling up by itself. In that case, examination of classes + anything else added to permgen might be the way (as has been already mentioned in previous answer).

If that doesn't give more insight, my next step would be trying out plumbr tool. They have a sort of guarantee on finding the leak for you, as well.

Solution 4

I would first try to find the root cause for the PermGen getting larger before randomly trying JVM options.

You could enable classloading logging (-verbose:class, -XX:+TraceClassLoading -XX:+TraceClassUnloading, ...) and chek out the output
In your test environment, you could try monitoring (over JMX) when classes get loaded (java.lang:type=ClassLoading LoadedClassCount). This might help you find out which part of your application is responsible.
You could also try listing all the classes using the JVM tools (sorry but I still mostly use jrockit and there you would do it with jrcmd. Hope Oracle have migrated those helpful features to Hotspot...)

In summary, find out what generates so many classes and then think how to reduce that / tune the gc.

Cheers, Dimo

View more solutions

14,939

Author by

Jose Otavio

Updated on June 16, 2022

Comments

Jose Otavio almost 2 years
We have a fairly big application running on a JBoss 7 application server. In the past, we were using ParallelGC but it was giving us trouble in some servers where the heap was large (5 GB or more) and usually nearly filled up, we would get very long GC pauses frequently.

Recently, we made improvements to our application's memory usage and in a few cases added more RAM to some of the servers where the application runs, but we also started switching to G1 in the hopes of making these pauses less frequent and/or shorter. Things seem to have improved but we are seeing a strange behaviour which did not happen before (with ParallelGC): the Perm Gen seems to fill up pretty quickly and once it reaches the max value a Full GC is triggered, which usually causes a long pause in the application threads (in some cases, over 1 minute).

We have been using 512 MB of max perm size for a few months and during our analysis the perm size would usually stop growing at around 390 MB with ParallelGC. After we switched to G1, however, the behaviour above started happening. I tried increasing the max perm size to 1 GB and even 1,5 GB, but still the Full GCs are happening (they are just less frequent).

In this link you can see some screenshots of the profiling tool we are using (YourKit Java Profiler). Notice how when the Full GC is triggered the Eden and the Old Gen have a lot of free space, but the Perm size is at the maximum. The Perm size and the number of loaded classes decrease drastically after the Full GC, but they start rising again and the cycle is repeated. The code cache is fine, never rises above 38 MB (it's 35 MB in this case).

Here is a segment of the GC log:

2013-11-28T11:15:57.774-0300: 64445.415: [Full GC 2126M->670M(5120M), 23.6325510 secs] [Eden: 4096.0K(234.0M)->0.0B(256.0M) Survivors: 22.0M->0.0B Heap: 2126.1M(5120.0M)->670.6M(5120.0M)] [Times: user=10.16 sys=0.59, real=23.64 secs]

You can see the full log here (from the moment we started up the server, up to a few minutes after the full GC).

Here's some environment info:

java version "1.7.0_45"

Java(TM) SE Runtime Environment (build 1.7.0_45-b18)

Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

Startup options: -Xms5g -Xmx5g -Xss256k -XX:PermSize=1500M -XX:MaxPermSize=1500M -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -Xloggc:gc.log

So here are my questions:
- Is this the expected behaviour with G1? I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...
- Is there something I can improve/corrrect in our startup parameters? The server has 8 GB of RAM, but it doesn't seem we are lacking hardware, performance of the application is fine until a full GC is triggered, that's when users experience big lags and start complaining.
- Jose Otavio over 10 years
  
  Here's the link of someone else asking for help on a very similar issue: mail.openjdk.java.net/pipermail/hotspot-gc-use/2013-October/‌…
- Elliott Frisch over 10 years
  
  I would try adding -verbose:gc to see more detail, also I might consider trying Chronon DVR.
- Jose Otavio over 10 years
  
  I'll add that option and leave the server running for a while with it to see if I get more information, but it's very clear to me what is causing the full GCs to run, I just don't understand if this is the correct behaviour of G1...
- Jose Otavio over 10 years
  
  By the way, I checked out Chronon DVR, looks interesting but I have to play with it a little more. However, I'm not sure it will help us in this case...
- Jose Otavio over 10 years
  
  Adding -verbose:gc didn't really help, unfortunately I got no additional information from the logs...
- Jose Otavio over 10 years
  
  To make it worse, it seems that the full collections are taking longer and longer to execute. These as parts of the GC log showing the moments where a full GC was executed: 2013-12-02T10:42:05.434-0300: 255248.909: [Full GC 1631M->489M(5120M), 62.9773920 secs] 2013-12-02T14:09:25.598-0300: 267689.073: [Full GC 1674M->567M(5120M), 69.2846050 secs] 2013-12-02T16:45:17.780-0300: 277041.255: [Full GC 1776M->524M(5120M), 81.1241990 secs] 2013-12-03T10:52:23.600-0300: 342267.075: [Full GC 1562M->531M(5120M), 172.7293720 secs]
- Elliott Frisch over 10 years
  
  Have you tried -XX:+UseConcMarkSweepGC or -XX:+UseParallelGC?
- smeaggie over 10 years
  
  I think the hint is right here in your post: "the number of loaded classes decrease drastically". Try find out what's responsible for generating almost 400.000 classes which can be unloaded. This doesn't sound right. Are you generating a lot of proxy classes somewhere? Frequent hot-deployments without server shutdown can trigger this as well. The number of loaded classes should be fairly stable after deployment (at least in my experience)
- Jose Otavio over 10 years
  
  @ElliottFrisch: we did use ParallelGC, we had different kinds of problems with it, that is why we decided to try G1. But that actually is part of my question: is G1 right for us? Maybe Concurrent Mark Sweep or Parallel GC might work better if we use the right parameters, but still I think this behaviour with G1 is very strange, I wanted to know if anybody has seen this and if this is the normal behaviour...
- Jose Otavio over 10 years
  
  @smeaggie: yes, we do use a lot of proxy classes, JBoss generates a lot of those in some cases. We are making improvements in our code to try to fix this, but still, when we used ParallelGC, the perm size would stabilize at around 390 MB, this behaviour started when we switched to G1. Do the other collectors perform some kind of incremental collection on the perm?
- Erik over 10 years
  
  This might not be applicable but I'll throw it out there. By setting -Xms == -Xmx you disable gc ergonomics. You'll gain faster start-up time but the gc can't adapt your memory layout. This might be bad.Apart from that, you reclaim about the same amount of memory each time but it takes longer and longer. This could mean that you have more objects in old gen each time and/or that you have lots of references between old gen and young gen. Your object creation rate and count could be interesting.
- smeaggie over 10 years
  
  very interresting blog post: mechanical-sympathy.blogspot.nl/2013/07/…, too long in total but the last paragraph sums it nicely: "If latency spikes are due to GC then invest in tuning CMS or G1 to see if your latency targets can be meet. Sometimes this may not be possible because of high allocation and promotion rates combined with low-latency requirements. GC tuning can become a highly skilled exercise that often requires application changes to reduce object allocation rates or object lifetimes."
- smeaggie over 10 years
  
  emphasis on "Sometimes this may not be possible because of high allocation and promotion rates". You probably have very high allocation and promotion rates.
- Jose Otavio over 10 years
  
  @Erik: actually this doesn't seem to be the case, when we analyze the GC with the profiler it is clear that the Eden Space and Old Generation are shrinking and expanding as necessary. As I said, when the Full GC occurs none of these regions are full (not even close), the Full GC is clearly triggered because the Perm Space is filled up.
- Jose Otavio over 10 years
  
  @smeaggie: very interesting article indeed! We are experimenting with different GC parameters and even considering using CMS or ParallelOldGC. Maybe G1 is not the best option in our case, but what I really wanted to know is if this the expected behaviour with G1 or if it can be avoided. As I said, we used ParallelGC before and didn't see this type of behaviour (FullGCs being triggered by a full Perm Gen).
- smeaggie over 10 years
  
  @Jose: I found the option "-XX:InitiatingHeapOccupancyPercent=0" to 'do constant GC cycles'. See oracle.com/webfolder/technetwork/tutorials/obe/java/… Apparently G1 triggers when a specific amount of total heap is used, regardless of generation usage. Maybe it just kicks in too late (default at 45%)?
- Elliott Frisch over 10 years
  
  Can you try changing this "-Xms5g -Xmx5g" either a min or a max heap, I don't think you want to do both (and I really don't think you want them the same size with this GC)?
- Jose Otavio over 10 years
  
  @smeaggie: yes, G1 triggers minor collections when the heap occupancy is above a certain threshold (45% by default). It doesn't seem, in our case, we need to change this configuration, since the Perm Gen is not collected by minor collections, only by the full collections (which are being triggered when the Perm Gen is full).
- Jose Otavio over 10 years
  
  @ElliottFrisch: I ran some tests using "-Xms1G -Xmx5G", but we still see the same behaviour: the Eden and Old are growing and shrinking as necessary, but the Perm Gen fills up indefinitely until a full GC is triggered. It's pretty clear to me now that our problem with the number of classes being loaded and that's one thing we are going to address now, but I still think G1 could be a little smarter about it...
- Elliott Frisch over 10 years
  
  Have you looked with VisualVM?
- Joshua Wilson over 10 years
  
  @ElliottFrisch There are several places in the JBoss docs and Knowledge pages that suggest you set the min and max heap to the same size (Xms=Xms). It avoids the major (full) garbage collections the JVM has to do to resize the heap or permanent generation space.
- Elliott Frisch over 10 years
  
  @JoshuaWilson I just wanted to validate that it still applied with the G1 collector (I remember BEA recommended that Xms=Xmx at least as far back as Java 1.2 and Weblgocic 5.1).
- smeaggie over 10 years
  
  Joshua Wilson's post captures some nice things about the G1GC vs CMS, but the anwser to the question about why this happens may be in an older email conversation, the interesting part actually starting here: mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-July/…. They discuss the possibility certain regions may never be collected, and as such forcing full GC's. There are some very interesting pointers throughout the whole discussion, but unfortunatly I couldn't find a definitive answer there. Maybe you find some pointers based on your experience with your own code.
- Jose Otavio over 10 years
  
  Yes, there's definitely not a single right answer, but this e-mail conversation does give us some pointers. We'll just have to tackle the problem of having so many classes being loaded and then try to tune our GC configuration in the best way possible for our case.
- Joshua Wilson over 10 years
  
  Can you tell if code cache is full when the Full GC happens? Does it go down after the full GC?
- Jose Otavio over 10 years
  
  It's not, it never rises above 38 MB, but it's usually below that (the maximum size is 48 MB).
- Joshua Wilson over 10 years
  
  ok, there was a problem in an earlier version of Java 7. I just wanted to make sure it hadn't come back in some form or another. Also, I will not be notified of a comment if you don't @ me. ;)
- Jose Otavio over 10 years
  
  @JoshuaWilson: Ok, sorry about that :)
Jose Otavio over 10 years

We do use the RMI GC Interval setting, I even tried removing it but it makes no difference. The FullGCs are clearly being triggered when the Perm Gen fills up. That being said, I'm not sure changing the Eden size or the tenuring threshold will help, in the GC log I see no promotion failures (or "to-space overflow") at all, the Old Gen is not even half full when the Full GC is triggered. I'll keep experimenting with different parameters, but I think this behaviour with G1 is very weird. It might be a problem with our application, but ParallelGC didn't have this problem (but had other problems)
Jose Otavio over 10 years

I think you're right, maybe it's our best alternative right now. I'm beginning to think that maybe this didn't happen with ParallelGC before because the major collections were more frequent, which prevented the Perm Gen to grow too much.
Joshua Wilson over 10 years

We actually did try G1 but then went back to UseConcMarkSweepGC and made these changes. To be clear these aren't random JVM settings, these are what we used. The NOTE at the bottom is meant to make it clear that you use 1 option or the other, not both, because they do the same thing. Also, the problem we had was not promotion failure but the fact that promotions were happening to often. That was pushing things into the Perm space when they shouldn't have ever gotten there.
Joshua Wilson over 10 years

I updated the post, please let me know if this helps or if you have more questions. cc @ElliottFrisch
Jose Otavio over 10 years

The additional information you provided helps. After much reading here and in other places, it really seems we were looking at the issue from the wrong angle: G1 was not the cause of the problem, it only helped expose it. We'll first tackle the problem of having so many classes being loaded (one of the causes being the fact that our application has many calls to remote EJBs which don't need to be remote at all), and at the same time experiment with different collectors and different parameters to see what works best.
Jose Otavio over 10 years

I'll try the configuration you suggested, since the profiling tool we are using (which is very good, BTW) shows what you mentioned: promotions are happening too often and sometimes prematurely, which makes the Old Gen and Perm Gen fill up quicker, which results in more full GCs.
Jose Otavio over 10 years

Thanks for the info. This is really the way we are thinking of going on right now: find out what is making so many classes get loaded, then tuning our GC configuration. As I said in another comment, G1 is definetely not the cause of the problem, but it helped expose it.
Jose Otavio over 10 years

We don't usually redeploy our application without restarting JBoss, I really think now that the fact that the Perm Gen is getting filled up so quickly is because of how our application is implemented, G1 actually only helped to expose this problem to us.
Jose Otavio over 10 years

We were looking at the problem from the wrong angle, as G1 was not the cause, it only helped expose it. We are investigating the problem and already found that one of the main causes is the fact that our application has a lot of remote EJBs being called, so we are tackling that first. At the same time, we'll just have to experiment with different GC configurations until we find what works best for us.
Jose Otavio over 10 years

I'm going to run some tests with CMS, but when I start up the server with -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled I get the following message in the log: "Please use CMSClassUnloadingEnabled in place of CMSPermGenSweepingEnabled in the future". Maybe I should only use the first option then?
Joshua Wilson over 10 years

Yes, just the first. The JVM must have been updated since those directions were give.
Prashant over 10 years

You dont need -verbose:gc if you use -Xloggc and -XX:+PrintGCDetails (it is a legacy synonym).
Prashant over 10 years

Survivor and NewGen are not really related to the PermGen (they do affect the retained classes but only for a short time).
Joshua Wilson over 10 years

@eckes - So you are saying that objects don't move from New to Old to Perm?
Prashant over 10 years

@Joshua Yes there is no promotion from old to perm. Objects in the permgen are always created there directly based on their type (classes and classloaders).
Joshua Wilson over 10 years

@eckes Yeah, new and old gen hold objects and permgen holds classes and some other things. The permgen does get garbaged collected though. That GC unloads the classes for the objects that are no longer being used. So... I was not completely right in saying that new fills old and old fills perm but I was not completely wrong either. When an object is created it's class is added to the permgen if not already there. So ... you can fill up the permgen via the old in that manner. Also see this article.
Prashant over 10 years

@Joshua yes, thats why I wrote that they effect the retained classes. However no Object can survive very long in Eden, so it is typically not the reason for a leak in permgen (only the objects in "old" generation can hold classes alive for longer).
Joshua Wilson over 10 years

@eckes That is exactly my point. The old gen is filling up causing the permgen to fill up. Or are we still talking about different things. I also re-worded the beginning of the answer to more clearly reflect this conversation.
Joshua Wilson over 9 years

@Tim I see your point, thanks for bringing it up. I have removed the line.