Tracking excessive garbage collection in Hotspot JVM

Quite frequently due to memory leaks or other memory problems applications freeze leaving only the garbage collector (GC) process running unsuccessfully trying to free some space. This happens until watchdog (or frustrated administrator) restarts the application and the problem is never solved.
The goal of this article is to show how to identify excessive GC and to get a heap dump whenever it occurs. It assumes Hotspot JVM version 1.6 or higher.
Relevant JVM flags
Using following flags we can direct Hotspot JVM to throw a heap dump when an application becomes GC driven.
First of all -XX:+HeapDumpOnOutOfMemoryError flag should be added.
Our goal is to have an OutOfMemory error generated when throughput of an application drops because of non stopping GC. There are two JVM flags that will help:
  • -XX:GCTimeLimit = 98 – defines a limit of proportion of time spent in GC before an OutOfMemory error is thrown
  • -XX:GCHeapFreeLimit = 2 – defines minimum percentage of free space after a full GC before an OutOfMemoryError is thrown
Both flags are enabled by default, but the out of memory error is not triggered frequently. Hence, deeper understanding is needed in order to decide when each flag is useful and what should be the defaults.
Oracle documentation
Excessive GC Time and OutOfMemoryError.
The concurrent collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line.
The policy is the same as that in the parallel collector, except that time spent performing concurrent collections is not counted toward the 98% time limit. In other words, only collections performed while the application is stopped count toward excessive GC time. Such collections are typically due to a concurrent mode failure or an explicit collection request (e.g., a call to System.gc()).
As the second passage states, GCTimeLimit does not work well with parallel collector as time spent in GC is a less important factor in that case.
Choosing defaults for GCHeapFreeLimit 
The defaults should be chosen in such a way that when an application becomes unresponsive (or driven by GC) out of memory will be thrown. Unresponsiveness is a subjective definition that may vary from application to application, therefore the resulting value may be (slightly) different for different applications.
First a simple test should be created. Class that runs inside an application and allocates huge amounts of objects. Part of the objects are collected fast and part live for a long time.
GC logging flags should be added before running the test. For my application following flags were added:
-Xmx960m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -Xloggc:/root/gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/root/dumps
The output of GC log given below. First full GC cycle – application is responsive:
[Full GC [PSYoungGen: 290112K->33213K(308672K)] [PSOldGen: 655359K->655359K(655360K)] 945471K->688573K(964032K)]
After few minutes – application in unresponsive:
[Full GC [PSYoungGen: 290112K->213269K(308672K)] [PSOldGen: 655359K->655359K(655360K)] 945471K->868629K(964032K)]
For 950 Mb heap the distribution of memory spaces is: old gen = 650Mb, young = 300Mb. When the application is responsive, old generation becomes full with strong referenced data, while most of the young generation is garbage.
Therefore one estimate for GCHeapFreeLimit is (all young is cleared and nothing is cleared from old) 300/950 ~ 32%.
However when the time passes, objects from the young space can not be promoted to the old one, because it’s full. As it says in the Oracle documentation: The youngest generation collection does not require a guarantee of full promotion of all live objects. (-XX:+HandlePromotionFailure flag).
Because of failed promotions there are much more referenced objects than expected at the young generation. In order to be on the safe side (less false positives) I assume that when more than 1/3 of the young space is garbage and old space is full the system is driven by GC.
Therefore the ratio I advice for application similar to mine is 200/950 ~ 20%.
-XX: GCHeapFreeLimit=20
Experiments show that OOM happens after 1-2 minutes of excessive GC above the 20% limit with 30-35 full GCs happening during that time.
Identifying excessive garbage collection is a difficult task in Java with no silver bullet. Hotspot JVM developers provided methods that will help frequently with identification of root cause. GCHeapFreeLimit and GCTimeLimit flags can shed the light on the problem when used properly: with appropriate values.
Related Whitepaper:

Bulletproof Java Code: A Practical Strategy for Developing Functional, Reliable, and Secure Java Code

Use Java? If you do, you know that Java software can be used to drive application logic of Web services or Web applications. Perhaps you use it for desktop applications? Or, embedded devices? Whatever your use of Java code, functional errors are the enemy!

To combat this enemy, your team might already perform functional testing. Even so, you're taking significant risks if you have not yet implemented a comprehensive team-wide quality management strategy. Such a strategy alleviates reliability, security, and performance problems to ensure that your code is free of functionality errors.Read this article to learn about this simple four-step strategy that is proven to make Java code more reliable, more secure, and easier to maintain.

Get it Now!  

Leave a Reply

3 + = nine

Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books