When troubleshooting systems for performance-related issues, memory optimizations are a place that needs a deep analysis of what each system stores in the memory, how long those are stored, and access patterns. This post is to keep a note on the background information and valuable points to note in such an effort, specific to Java-based implementations as a deep understanding of the JVM behaviors is very beneficial in the process.
Java language provides much convenience to the developers by taking care of the memory management to a great extent letting the focus be on the rest of the logic. Still having a good understanding of how Java does this underneath, rationalize several best practices we follow in Java implementations and help design the programs better and think seriously on some aspects that can later lead to memory leaks and system stability in the long run. Java Garbage Collector has a big role in this been responsible for freeing up memory by removing memory garbage.
This information is widely available, yet I am summarizing here for reference in one place. :)
JVM enables Java code to run in hardware and OS independent manner. It operates on memory locations allocated for own process by the OS acting as another abstraction of a physical machine.
JVMs can be implemented based on the open standard as published at , widely known implementations been Oracle Hotspot JVM, almost the same open-source version OpenJDK, IBM J9, JRockit and Dalvik VM used in Android OS with some deviations.
In brief JVM loads and executes compiled Java byte code using the resources allocated to it from the platform, it runs on.
loads the byte code in the JVM memory (load, link(verify, prepare, resolve –> if failed NoClassDef found exception is issued), initialize) Bootstrap class loaders, Extension class loaders, Application class loaders
Memory and runtime data area
This captures a few important sections below, though it is not comprehensive.
- Native method stack – The java native library stack which is platform-dependent, mostly written in C language.
- JVM stack (the currently executing method stack trace is kept, per thread. Recursive method calls can cause the stack to be filled and overflow(java.lang.StackOverFlowError) if proper breaks are not set. -Xss JVM option allows configuring the stack size.), PC register (program counter, points to the next instruction to be executed per thread. )
- Method area(stores Class data, size governed by XX:MaxPermSize, PermGen space 64MB default, if it is to serve huge server app loading millions of classes, then we can tune by increasing to avoid issues of OOM: PermGen space. From Java 8 onwards this PermGen space is referred as Metaspace with no limit in java8 by default though it is allowed to be fine-tuned and limit), Heap(Xms, Xmx), Run time constant pool
This engine executes the bytecode which is assigned to the runtime data areas through the classloader. It makes use of the Interpreter, Garbage Collector, Hotspot profiler, JIT compiler for optimized execution of the program.
Refer  for more details on the JVM architecture.
Now we know where the Garbage Collector sits in the JVM architecture. Let’s go deep into the internals.
It is the Java automatic memory management process which removes the objects that are not used anymore. Then comes the question, how does it decide if the object is used or not.
It defines two categories of objects as,
live objects – reachable objects that are referenced from another object. Ultimately the reference links will reach the root which is the main thread which creates the whole object graph.
dead objects – unreachable objects that are not referenced by any other that are just lying in the heap.
this categorization and garbage collection is based on two facts as below.
1. Most of the objects soon become unreachable after the creation. Mostly the short-lived objects which live only within a method context.
2. Old objects rarely refer to young objects. For example, a long-lived cache would hardly refer a newer object from it.
Garbage Collection Steps
Newly created object instances reside in the Java heap, which goes to different generations as shown below. Garbage collection is done by a daemon thread called ‘Garbage Collector’ which directs the objects through different spaces within the heap.
Garbage Collection is done in 3 steps.
1. Mark – Starting from the root and traverse through the object graph marking the reachable objects as alive.
2. Sweep – Delete the unmarked objects.
3. Compact – Defragment the memory making the allocations contiguous for the live objects. It’s considered the most time taking process.
The Heap Area is divided as below.
Old(tenured) generation – Objects that survived for a long, stay here until it get marked unreachable and cleaned up in a major garbage collection which runs through the whole heap.
Young generation – this is further divided into 3 as Eden space and 2 Survivor spaces.
Garbage collection at two stages as ‘Minor’ or ‘Major’. Both these garbage collections are stop-the-world operations that stop every other memory access. Minor GC might not be felt by the application though as it only scans through the young generation space will be small in size.
The memory life cycle goes as below as shown in the above animation.
1. Newly created objects reside in the Eden space. (Just as humans started from Eden garden :) ) Until Eden space goes full it keeps on getting new objects added there.
2. When the Eden space is full, a minor GC runs, marks the live objects, move those live objects to ‘Survivor from’ space and sweep the Eden space which becomes free.
3. Then it keeps on filling the Eden space with new objects as the program runs. Now when the Eden space is full, we have previously moved objects in the ‘Survivor from’ space too. Minor GC runs marking objects in both these spaces, move the remaining live objects as a whole to the other survivor space. Wonder why not copy the live objects from Eden space to the remaining space of ‘Survivor from’ rather than moving all to the other survivor space? Well, moving all to the other has proven more efficient in compact step over compacting the area with objects in it.
4. This cycle will repeat moving objects between the suvivor spaces until a configured threshold (-XX:MaxTenuringThreshold) is met. (It keep tracks of how many numbers of GC cycles have been survived by each object). When the threshold is met, those objects will be moved to the tenured space.
5. As time passes, if the tenured space also gets filled up, the major GC kicks in and traverse through the whole Heap memory space performing the GC steps. This pause can be felt in human interactions and is not desired.
When there is a memory leak or huge caches that reside for long time, tenured space gets filled up with the time. At such times, those objects might not be even detected as dead. This results in major GCs running frequently as it detects tenured space is full, but it fails to clean up enough memory as nothing can be swept out.
This error ‘java.lang.OutOfMemoryError’ in the logs would hint us clearly when memory is not enough. Also if we see frequent CPU hikes with high memory usage, it can be a symptom of frequent GC run due to some kind of memory handling issue that needs attention.
When focusing on JVM fine-tuning focusing on memory utilization, the major deciding factor is what is more critical from Responsiveness/latency and Throughput. If the throughput is of utmost importance as in batch processing, we can compromise with having some pauses for major GC to run, if it helps overall throughput. Because the application occasionally going less responsive might not be an issue there.
On the other hand, if responsiveness is of utmost importance as in a UI based application, we should try to avoid major GC. Doing this namely, would not help though. For example, we can delay a major GC by increasing the space for the young generation. But then the minor GC would start to take much time as it needs to traverse and compact a huge space now. Hence have the correct size, the correct ratio between young and old generations needs to be carefully done to achieve this. Sometimes this can even go into the application design details to fine-tune memory usages with the object creation patterns and caching locations. It will be a topic for another post to analyze the heap dumps and flame graphs to decide on the best things to be cached.
As the role of garbage collection is having this much impact on the performance of an application, so much of the effort have been put by the engineers to improve it. The result is, we have a choice on the best garbage collector to use as per the requirements. Below is a non-comprehensive list of options.
1. Serial Collector
Runs in a single thread. Only suitable for basic applications.
2. Concurrent Collector (CMS – Concurrent Mark and Sweep)
A single thread performs garbage collection. It only stops the world in mark and re-mark phase. The rest of the work is done while the application is running and does not wait for the old generation to be full. This is a good choice when the memory space is large, has a high number of CPUs to cater for concurrent execution, and when the application demands the shortest pauses with responsiveness been the critical factor. This has been the most favored in most of the web applications in the past.
3. Parallel Collector
This collector makes use of multiple CPUs. It waits for the old generation to be full or near full, but when it runs it stops the world. Multiple threads do the mark, sweep, and compacting making the garbage collection much faster. When the memory is not very large and the number of CPUs is limited this is a good option to cater to demands on throughput which can withstand pauses.
4. G1(Garbage First) collector (1.7 upwards)
This option improves garbage collection to be more predictable by allowing configurations such as pausing time when GC runs. It is said to have the good of both worlds of parallelism and concurrency. It divides the memory into regions and each region is considered as either an Eden, survivor or a tenured space. If the region is having more unreachable objects then that region is garbage collected first.
Default Garbage Collector in Versions
- Java 7 – Parallel GC
- Java 8 – Parallel GC
- Java 9 – G1 GC
- Java 10 – G1 GC
- Java 11 – G1 GC (ZGC provided as an experimental feature along with Epsilon)
- Java 12 – G1 GC (Shenandoah GC introduced. OpenJDK only.)
Tune-up parameters for the garbage collector
The rule of thumb for tuning up the JVM is not to do so unless there is an issue to be addressed with the default settings or decided after a lot of deliberation with proven effects after long-running production-level load patterns. This is because Java Ergonomics has advanced a lot and would be most of the time able to perform a lot of optimizations if the application is not behaving ugly. A comprehensive list of options can be found at  including configuring the sizes of the heap spaces, thresholds, type of garbage collector to use, etc.
Below configurations are useful to diagnose memory issues with the help of GC behavior in addition to the heap dumps.
-XX:-PrintGCDetails – Print details of garbage collection.
-Xloggc:<file-name> – Print GC logging details to a given file.
-XX:-UseGCLogFileRotation – Enable GC log file rotation when the above configuration is done.
-XX:-HeapDumpOnOutOfMemoryError – Dump the heap content for further analysis if a OOM error occurs.
-XX:OnOutOfMemoryError=”<cmd args=””>;<cmd args=””>” – Set of commands to be run, if an OOM error occurs. Allows to execute any custom task when facing the error.
We will go into the diagnose and analyzing details in another post.
 – https://docs.oracle.com/javase/specs/index.html
 – https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html#jvms-2.5.6
 – Oracle Garbage Collection tuning guide –
 – New java garbage collectors –
 – Available collectors –
 – JVM options –