Performance Anxiety – on Performance Unpredictability, Its Measurement and Benchmarking

Jakub HolyFebruary 26th, 2011Last Updated: October 24th, 2012

0 158 4 minutes read

Joshua Bloch had a great talk called Performance Anxiety (30min, via Parleys slides also available ) at Devoxx 2010, the main message as I read it was

Nowadays, performance is completely non-predictable. You have to measure it and employ proper statistics to get some meaningful results.
Microbenchmarking is very, very hard to do correctly. No, you misunderstand me, I mean even harder than that!
From the resources: Profilers and result evaluation methods may be very misleading unless used correctly.

There has been another blog about it but I’d like to record here more detailed remarks.

Today we can’t estimate performance, we must measure it because the systems (JVM, OS, processor, …) are very complex with many different heuristics on various levels and thus the performance is highly unpredictable. This doesn’t apply only to Java, but also to C, C++, even to assembly code.

Example: Results during a single JVM run may be consistent (warm-up, then faster) but can vary between JVM executions even by 20%. One of the causes may be Compilation Planning (what’s inlined, …) – it’s done in a background thread and thus is inherently non-deterministic.

Therefore don’t estimate but measure and not only that – also do statistical processing of the data (how often diff. values appear, what they are, … – mean, median, standard deviation etc.).

“Profilers don’t help much; in fact, they can mislead” – Mytkowicz, Diwan etc. – “Evaluating the Accuracy of Java Pro?lers”, PLDI ’10 – in their experiment, each of 4 leading profilers identified a different hotspot. I’d really recommend you reading the related StackOverflow discussion “If profiler is not the answer, what other choices do we have?” (the answer is: profilers have their value, but use the correct ones and use them correctly). The conclusion of the original paper:

Our results are disturbing because they indicate that pro?ler incorrectness is pervasive—occurring in most of our seven benchmarks and in two production JVM—-and signi?cant—all four of the state-of-the-art pro?lers produce incorrect pro?les. Incorrect pro?les can easily cause a performance analyst to spend time optimizing cold methods that will have minimal effect on performance. We show that a proof-of-concept pro?ler that does not use yield points for sampling does not suffer from the above problems.

“Benchmarking is really, really hard!” and “Most benchmarks are seriously broken“. Broken means that either the measurement’s error is higher than the value being measured or that the results obtained are unrelated to intended measurements. It seems that it is actually really hard to find a (micro)-benchmark, which isn’t broken. Joshua recommends Cliff Click’s JavaOne 2009 presentation The Art of (Java) Benchmarking (see also an interesting related interview with Cliff), which I belive to have seen and which points out the various traps here. Joshua also mentions that some frameworks, such as Google Caliper may help you to avoid the pitfalls, though I’m quite sure they can’t protect you from all.

Joshua mentions a couple of interesting papers, you should check the slides for them. One which sounds really interesting to me is by Georges, Buytaert and Eeckhout – Statistically Rigorous Java Performance Evaluation, OOPSLA07 (20 pages). They mention there that you need to run VM 30 times to get meaningful data. From the abstract:

This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.

Personal touch

I find this subject very interesting because for over a year I’m involved in performance optimization of one of our data feeds, which used to run for couple of days (latest results: 1/2h [with a bit of cheating]). My experience completely supports what Joshua says – don’t guess but measure, profilers may be misleading, performance is unpredictable. Though as a collegue mentioned, in the domain of enterprise Java, our performance problems are usually caused by the database and communication with it (which 100% applies to that feed too).

I’ve already blogged about some experiences, e.g. in The power of batching or speeding JDBC by 100 (inspired by JDBC performance tuning with fetch size), check also the performance tag for interesting links. I also appreciated and applied the knowledge from Accurately computing running variance (I often wish I have slept less and paid attention more during the uni math lectures ).

Conclusion

The higher complexity, the higher unpredictability

As an application programmer, use high-level, declarative constructs where posible to push the responsability for performance one level down to library and JVM authors, who should know better.
Measure repeatedly and process the results with proper statistics. Don’t forget to repeat them over time, the platform evolves with every release.

Once again, microbenchmarking is hard! If you have to play with it, use something like Caliper and be aware that your results are most likely wrong anyway.

Disclaimer

Personally I would like to thank our JCG partner Jakub, for sharing quality information with the community, since this article was originally posted on his web log “The Holy Java“.

Furthermore I strongly suggest you should test out the open source Java Benchmarking framework from Brent Boyer which you can download from his site here. You should also read the “Robust Java benchmarking” series of articles (Part 1: Issues and Part 2: Statistics and solutions) Brent has published at IBM developerWorks.

Code is here … code is there … code is everywhere ;-) Do not forget to share!

Byron

Related Articles: