GC impact on throughput and latency

One type of the problems each and every Java application out there has to wrestle with is related to garbage collection. When the garbage collector works, it represents a wonderful invention. When it does not – or when the way GC is doing its housekeeping becomes unpredictable – then you have a friend who has turned into a foe.

This post is about garbage collection pause times. Or more precisely – why should you care about the pauses.

Some posts ago, I explained throughput and latency via mr Apple CEO Tim Cook planning for the iPad demand and building factories. I will stick to the same illustrative story:

  • We have a factory line producing one iPad per second. Each second, every second. So the throughput of the line is 86,400 iPads/day
  • It takes four hours to complete an iPad from the start where the casing is molded to the finish when the acceptance tests on the iPad have been concluded. So the latency of the line is four hours.

The system above and the calculations are based on the assumption that the factory line is operational 24 hours a day, each day, every day. But all factory lines tend to need maintenance which is equivalent to garbage collection running inside the JVM.

As an example – lets take small maintenance tasks, which can be handled without much interruptions. Examples could involve adding oil to the machinery or picking up excess trash from the floor next to the molding equipment. Those operations are similar to minor GC’s within the JVM – it is maintenance you have to deal with, but the implementation is so clever that the performance of the system is not affected.

But in the very same factory mr. Tim Cook is going to face long-lasting maintenance tasks as well. Those tasks involve stopping the whole production line and are equivalent to the Full GC runs, where the JVM needs to stop servicing the threads in order to do some important housekeeping tasks.

Now, lets assume that after months of uninterrupted service, our hypothetical factory line gets jammed and the tech team takes four hours to resolve the issue. During this period the line is stopped. How do we measure the effect? As always, the impact can be measured by two different means:

  • Impact on throughput. The four-hour stop means we have 14,400 seconds during which no iPads are completed. Throughput-wise it means we have reduced the system’s capacity in this particular day from 86,400 to 72,000. Which means approximately ~16.5% loss in throughput.
  • Impact on latency. Now, if we took an iPad which was still on the line when the interruption occurred, it took not four but eight hours to complete. This represents a 100% increase in worst-case latency.

If you recall then mr. Cook did not care about latency. What was important for him was the overall throughput during a longer period, so mr. Cook would decide to optimize his processes in a way that the impact on throughput would be minimized.

Similar decisions need to be made in software development as well. If you have a Java EE application responsible for order processing, then a GC pause spanning four seconds would definitely reduce the throughput of your system. But for most of us it will not be a major issue. On the other hand, the users who were trying to accomplish things during the four-second stop-the-world-i-have-cleaning-to-do pause would get a perception that our systems are sluggish. And operating a service which is perceived by users as sluggish is a darn good way to go out of business.

The morale of the story? Pick your goals wisely and make sure you do not confuse throughput with latency. Then make sure you understand how GC can affect either of those by monitoring your GC logs, looking for unexpected Full GCs and tuning your application and/or GC to minimize their impact.
 

Reference: GC impact on throughput and latency from our JCG partner Nikita Salnikov Tarnovski at the Plumbr Blog blog.
Related Whitepaper:

Bulletproof Java Code: A Practical Strategy for Developing Functional, Reliable, and Secure Java Code

Use Java? If you do, you know that Java software can be used to drive application logic of Web services or Web applications. Perhaps you use it for desktop applications? Or, embedded devices? Whatever your use of Java code, functional errors are the enemy!

To combat this enemy, your team might already perform functional testing. Even so, you're taking significant risks if you have not yet implemented a comprehensive team-wide quality management strategy. Such a strategy alleviates reliability, security, and performance problems to ensure that your code is free of functionality errors.Read this article to learn about this simple four-step strategy that is proven to make Java code more reliable, more secure, and easier to maintain.

Get it Now!  

One Response to "GC impact on throughput and latency"

  1. Zoltan Juhasz says:

    This post can couse misunderstandings about he GC.
    You say the GC means performance loss in the throughput, but compared to what?
    - Not having garbage collection requires practically infinite amount of memory, which is not an option I think.
    - Switching back to some GC-less language, like C++ requires manual garbage handling, – memory allocation, deallocation of the memory, – which is much less efficient than the Java GC.
    (- garbage free applications has serious drawbacks for everyday use)

    So overally, tha Java GC means performance increase in throughput and average latency. Of course GC tuning is still required to make it even more performant.

    BUT, as you said (so the point of the post is valid and important), it causes huge worst-case latency. So if latency matters (under 0.1 second latency requirements, like in high requency trading), it MUST be handled some way.

Leave a Reply


8 − = one



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books