This story goes back at least a decade, when I was first approached by a PHB with a question “How big servers are we going to need to buy for our production deployment”. The new and shiny system we have been building was nine months from production rollout and apparently the company had promised to deliver the whole solution, including hardware.
Oh boy, was I in trouble. With just a few years of experience down my belt, I could have pretty much just tossed a dice. Even though I am sure my complete lack of confidence was clearly visible, I still had to come up with the answer. Four hours of googling later I recall sitting there with the same question still hovering in front of my bedazzled face:
“How to estimate the need for computing power?”
In this post I start to open up the subject by giving you rough guidelines on how to estimate memory requirements for your brand new Java application. For the impatient ones – the answer will be to start with the memory equal to approximately 5 x [amount of memory consumed by Live Data] and start the fine-tuning from there. For the ones more curious about the logic behind, stay with me and I will walk you through the reasoning.
First and foremost, I can only recommend to avoid answering a question phrased like this without detailed information being available. Your answer has to be based upon the performance requirements, so do not even start without clarifying those first. And I do not mean way-too-ambiguous “The system needs to support 700 concurrent users”, but a lot more specific ones about latency and throughput, taking into account the amount of data, usage patterns. One should also not forget about the budget also – we all can all dream about sub-millisecond latencies, but those without HFT banking backbone budgets – unfortunately it will only remain a dream.
For now, lets assume you have those requirements in place. Next stop would be to create the load test scripts emulating user behaviour. If you are now able to launch those scripts concurrently you have built a foundation to the answer. As you might also have guessed, the next step involves our usual advice of measuring not guessing. But with a caveat.
Live Data Size
Namely, our quest for the optimal memory configuration requires capturing the Live Data Size. Having captured this, we have the baseline configuration in place for the fine-tuning.
How does one define live data size? Charlie Hunt and Binu John in their “Java Performance” book have given it the following definition:
Live data size is the heap size consumed by the set of long-lived objects required to run the application in its steady state.
Equipped with the definition, we are ready to run your load tests against the application with the GC logging turned on (-XX:+PrintGCTimeStamps -Xloggc:/tmp/gc.log -XX:+PrintGCDetails) and visualize the logs (with the help of gcviewer for example) to determine the moment when the application has reached to the steady state. What you are after looks similar to the following:
We can see the GC doing its job both with minor and Full GC runs in a familiar double-saw-toothed graphic. This particular application seems to have achieved a steady state already after the first full GC run on 21st second. In most cases however, it takes 10-20 Full GC runs to spot the change in trends. After four full GC runs we can estimate that the Live Data Size is equal to approximately 100MB.
The aforementioned Java Performance book is now indicating that there is a strong correlation between the Live Data Size and the optimal memory configuration parameters in a typical Java EE application. The evidence from the field is also backing up their recommendations:
Set the maximum heap size to 3-4 x [Live Data Size]
So, for our application at hand, we should set the -Xmx to be in between 300m and 400m for the initial performance tests and take it from there.
We have mixed feelings about other recommendations given in the book, recommending to set the maximum permanent generation size to 1.2-1.5 x [Live Data Size of the Permanent Generation] and the -XX:NewRatio being set to 1-1.5 x of the [Live Data Size]. We are currently gathering more data to determine whether the positive correlation exists, but until then I recommend to base your survival and eden configuration decisions on monitoring your allocation rate instead.
Why should one bother you might now ask. Indeed, two reasons for not caring immediately surface:
- 8G memory chip is in sub $100 territory at the time of writing this article
- Virtualization, especially when using large providers such as Amazon AWS make adjusting the capacity easy
Both of the reasons are partially valid and have definitely reduced the need for provisioning to be precise. But both of them are still putting you in the danger zone
- When tossing in huge amounts of memory “just in case” you are most likely going to significantly affect the latency – going into heaps above 8G it is darn easy to introduce Full GC pauses spanning over tens of seconds.
- When over-provisioning with the mindset of “lets tune it later”, the “later” part has a tendency of never arriving. I have faced numerous applications running on vastly over provisioned environments just because of this. For example the aforementioned application I discovered running on Amazon EC2 m1.xlarge instance was costing the company $4,200 per instance / year. Converting it to m1.small reduced the bill to just $520 for the instance. 8-fold cost reduction will be visible from your operations budget if your deployments are large, trust me on this.
Unfortunately I still see way too many decisions made exactly like I was forced to do a decade ago. This leads to the under- and over planning of capacity, both of which can be equally poor choices, especially if you cannot enjoy the benefits of virtualization.
I got lucky with mine, but you might not get away with your guestimate, so I can only recommend to actually plan ahead using the simple framework described in this post.