Does Java Garbage Collect always has to "Stop-the-World"?

java algorithm garbage-collection

14,427

Solution 1

Key reason why compaction leads to STW pause is as follows, JVM needs to move object and update references to it. now if you move object before updating the references and application that is running access it from old reference than there is trouble. if you update reference first and than try to move object the updated reference is wrong till object is moved and any access while object has not moved will cause issue.

For both CMS and Parallel collecter the young generation collection algorithm is similar and it is stop the world ie application is stopped when collection is happening Stuff JVM is doing is, marking all objects reachable from root set, moving the objects from Eden to survivor space and moving objects that have survived collections beyond tenuring threshold to the old generation. Of course JVM has to update all the references to the objects that have moved.

For old generation parallel collector is doing all marking, compaction and reference updates in a single stop the world(STW) phase, this leads to pauses in seconds for heaps in GBs. This was painful for the applications that have strict response time requirements. Till date Paralle collector is still the best collectors(among Oracle Java) for throughput or batch processing. In fact we have seen for same scenario even if time spent in pauses is more in parallel collector than CMS still we get a higher throughput, this I think has to do with better spatial locality due to compaction.

CMS solved the problem of high pauses in major collection by doing the Marking concurrently. There are 2 STW parts, Initial marking (getting the references from root set) and Remark Pause (a small STW pause at the end of marking to deal with changes in the object graph while marking and application was working concurrently). Both these pauses are in range of 100 -200 milliseconds for few GB of heap sizes and reasonable number of application threads(remember more active threads more roots)

G1GC is planned to be a replacement of CMS and accept goals for pauses. takes care of fragmentation by incrementally compacting the heap.Though the work is incremental so you can get smaller pauses but that may come at the cost of more frequent pauses

None of the above can compact heap(CMS does not compact at all) while application is running. AZUL GPGC garbage collection can even compact without stopping the application and also handle reference update. So if you want to go deep into how GCs work it will be worth reading the algorithm for GPGC. AZUL markets it as a pause-less collector.

Solution 2

All freely available GCs in openjdk have some stop the world events. And not just the GCs, other things such as deoptimizations can trigger safepoints too.

But not all pauses are equal. CMS and G1 do not need to scale their pause times with the live data set in the old generation because they only scan a subset of the objects during the pauses and do a major fraction of their work concurrently, unlike the Serial and Throughput collectors.

ZGC (available since OpenJDK11) and Shenandoah (since 12) are collectors that further decouple pause times from the live data set size and scale their pauses with only the root set size instead.

Additionally other GC implementations exist which avoid global pauses - they may still experience per-thread pauses - or make the pause durations O(1), i.e. independent of live data set size. A commonly cited example is azul's C4 collector.

So the second question comes to why the compaction needs a STW pause?

Compacting means moving objects. Moving objects means pointers need to be updated. This is very difficult or costly to achieve safely when applications threads are still running.

Concurrent algorithms generally pay some cost in throughput and complexity in exchange for their lower pause times. Not doing compactation makes CMS relatively(!) simple for a concurrent collector.

Solution 3

Here is a link that gives some good information about the different collectors in java 8: https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/collectors.html#sthref27

All strategies will stop-the-world. But, your performance requirements can drive you to choose differing GC strategies to improve performance or response times.

Solution 4

Stop-the-world will occur no matter which GC algorithm you choose. Stop-the-world means that the JVM is stopping the application from running to execute a GC. When stop-the-world occurs, every thread except for the threads needed for the GC will stop their tasks.

View more solutions

14,427

Guifan Li

Updated on October 20, 2022

Comments

Guifan Li over 1 year

I am trying to understand Java's garbage collection more deeply.

In HotSpot JVM generational collection, in the heap, there are three areas (Young generation, Old generation and permanent generation). Also, there are two kinds of algorithms:

1) Mark Sweep Compact.

2) Concurrent Mark and Sweep.

Is that true whether GC needs "Stop-the-world" depends on the algorithm it uses rather than which generation it operates on? In another word, if I use 1) as GC algorithm on all three areas, STW will always happen ?

Also, I understand the difference is the second GC algorithm doesn't require Compaction which will result in fragmentation eventually. So the second question comes to why the compaction needs a STW pause?
KriptSkitty over 7 years

There are incremental and concurrent garbage collection algorithms, you know.
Ganesh Sahu over 7 years

Yes Incremental GC also stops the application threads .Please have a look oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#i‌cms
michaelok almost 7 years

Azul Systems has a GC algorithm for it's Zing VM that does not use stop the world, but I think the OP was talking about the Oracle VM, and those do fall back to STW, though of course, the GC algorithms have evolved over the years and keep getting better, take a look at Shenandoah "Shenandoah is an ultra-low pause time garbage collector that reduces GC pause times by performing more garbage collection work concurrently with the running Java program. " openjdk.java.net/projects/shenandoah
michaelok almost 7 years

Fantastic summary of the often bewildering complexity surrounding GCs, I don't think I've read such a neat, concise description of GCs before. One more GC to add is Shenandoah which is built for Java 9 but there is a backport for Java 8. Throughput isn't quite as good as G1, but if your goal is ultra-low pause, it is pushing the limits. rkennke.wordpress.com/2016/02/08/shenandoah-performance
Eugene over 3 years

@michaelok the begining of this answer is just plain wrong. Moving the object and updating the reference can ( and are ) done concurrently, by Shenandoah 2.0 for example. So yeah, the reason for STW are different
Eugene over 3 years

You mention ZGC and Shenandoah and moving objects and pointers as the reason for STW, but they both move objects concurrently. Did I miss-read your answer may be?
the8472 over 3 years

@Eugene no, I did not write that moving objects is THE reason for STW. In fact I did not even write that compacting must be STW. Merely that it is difficult to do concurrent to application threads. And those statements are in different paragraphs that do not refer to each other. So in no way did I intend to imply that ZGC or shenandoah need to STW for compacting.