Examples of caching backfiring on performance

Nikita Salnikov TarnovskiAugust 18th, 2015Last Updated: August 13th, 2015

1 45 4 minutes read

In 2015 it should not surprise anyone that caching frequently used data is likely to improve the performance of the application. Caching certain data structures locally inside the JVM instead of requesting them via remote calls from external storage is a widely used technique.

Introducing such caches to your applications is likely to improve the application’s performance both in regards of latency and throughput. But as always, there are no free lunches in this world. In this post we will look into two most common mistakes made while introducing such caches to your applications.

The first one of such potential problems is related to another aspect of performance optimization – namely capacity. Introducing caching to your application is going to require more memory for the same application. Increased infrastructure cost aside, this can have other negative side effects, first and foremost the Garbage Collection process will have to do more work in order to traverse the data structures in memory.

As a result, the application might expose better performance in regards of average or mean latency, but the worst-case latency of certain operations might become even worse than before. Just think of the additional gigabytes of cache data structures the GC algorithms will now have to traverse to detect unused objects.

To reduce the impact of this problem, you should start by monitoring the utilization of your cache. Utilization is measured by percentage of the objects in cache that are actually being used. As one might guess, high utilization rate (in the high nineties) is a sign of a well-designed cache. Low utilization on the other hand is a sign that you are loading stuff in memory that never gets used.

To understand the concept, lets check a hypothetical situation where information about Persons, such as their name, social security number and place of birth are frequently used. Such Persons are likely to be close to immutable if you take into account how often one will actually change their name or date of birth in the real world. For such objects in your application you can and probably should introduce a cache to make sure such Persons are loaded from external storage only once.

One way to populate such a cache is to eagerly load all the Persons that the external storage has into the local cache. When the external contains the entire population of the United States, you will end up with a cache containing over 300M Persons. If, on the other hand, your application is designed to be used only by the residents of Utah, with the population of just under 3M, you would quickly notice an extremely low cache utilization where just under 1% of your cache entries are being used.

Having this kind of transparency into your cache utilization ratio helps you to spot over-provisioning. Having such insight allows you to start downsizing caches, reducing both memory consumption and the burden on your GC algorithms.

The second and an even more dangerous situation that caching solutions tend to introduce is exposed when deploying under-provisioned caches with certain caching algorithms. In such cases your application is exposed to a concept called quick turnover rate or trashing. Regardless of the name, the concept reveals its ugly head when the content of the cache is constantly being evicted and read back again.

The concept of quick turnover rate is again best explained with an example. Let’s consider the very same situation of a Person cache being built into the application. Instead of the Utah-based deployment, let us roll our solution out to the entire population of the United States consisting of 300M+ Persons. However, let’s do this with the same cache size that worked so well for Utah and leave the maximum cache size to just 3M elements. In addition, let’s have the cache be lazily loaded and use LRU to handle the eviction-load policy on cache misses.

Launching the application will start filling the cache as different Persons are being used by the application. Monitoring the cache reveals that the first 3M entries are quick to fill up. But as our application is going to require the details of 300M+ Persons, we continue to receive requests for Persons that are still not cached, triggering cache misses. These cache misses then trigger the external resource to load yet more Persons who also wish to have their place in the already-filled cache. Now the LRU policy kicks in and ejects some objects from the cache to make room for the new objects to be loaded.

Unfortunately this process never stops – the cache is just way too small to be beneficial to the application. New business transactions arriving keep finding that in ~99% of the cases the Person requested is not present in the cache, triggering a cache miss and yet another eviction-load to the cache. In such a case the cache is potentially doing even more harm than good. If the turnover of the cache is faster than the usage of the existing cache entries, the application is in fact slower with such a cache. One of the reasons for it again being the additional pressure on Garbage Collection, which in this case has a lot of extra burden on its shoulders.

So in addition to monitoring cache utilization, you should also keep an eye on the cache hit ratio and the turnover ratio potentially triggered by cache misses. Noticing high level of cache misses can surface the root cause for thrashing.

Hopefully some of the readers will now be able to escape traps like this in the future. Plumbr Agents are not yet capable of monitoring the JVM in regards of cache use, but we have a lot of interesting research going on in the field. Stay tuned for the news!

Reference:

Examples of caching backfiring on performance from our JCG partner Nikita Salnikov Tarnovski at the Plumbr Blog blog.