Enterprise Java

Pitfalls of the Hibernate Second-Level / Query Caches

This post will go through how to setup the Hibernate Second-Level and Query caches, how they work and what are their most common pitfalls.

The Hibernate second level cache is an application level cache for storing entity data. The query cache is a separate cache that stores query results only.

The two caches really go together, as there are not many cases where we would like to use one without the other. When well used these caches provide improved performance in a transparent way, by reducing the number of SQL statements that hit the database.
 

How does the second level-cache work?

The second level cache stores the entity data, but NOT the entities themselves. The data is stored in a ‘dehydrated’ format which looks like a hash map where the key is the entity Id, and the value is a list of primitive values.

Here is an example on how the contents of the second-level cache look:

*-----------------------------------------*
|          Person Data Cache              |
|-----------------------------------------|
| 1 -> [ "John" , "Q" , "Public" , null ] |
| 2 -> [ "Joey" , "D" , "Public" ,  1   ] |
| 3 -> [ "Sara" , "N" , "Public" ,  1   ] |
*-----------------------------------------*

The second level cache gets populated when an object is loaded by Id from the database, using for example entityManager.find(), or when traversing lazy initialized relations.

How does the query cache work?

The query cache looks conceptually like an hash map where the key is composed by the query text and the parameter values, and the value is a list of entity Id’s that match the query:

*----------------------------------------------------------*
|                       Query Cache                        |                     
|----------------------------------------------------------|
| ["from Person where firstName=?", ["Joey"] ] -> [1, 2] ] |
*----------------------------------------------------------*

Some queries don’t return entities, instead they return only primitive values. In those cases the values themselves will be stored in the query cache. The query cache gets populated when a cacheable JPQL/HQL query gets executed.

What is the relation between the two caches?

If a query under execution has previously cached results, then no SQL statement is sent to the database. Instead the query results are retrieved from the query cache, and then the cached entity identifiers are used to access the second level cache.

If the second level cache contains data for a given Id, it re-hydrates the entity and returns it. If the second level cache does not contain the results for that particular Id, then an SQL query is issued to load the entity from the database.

How to setup the two caches in an application

The first step is to include the hibernate-ehcache jar in the classpath:

<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-ehcache</artifactId>
    <version>SOME-HIBERNATE-VERSION</version>
</dependency>

The following parameters need to be added to the configuration of your EntityManagerFactory or SessionFactory:

<prop key="hibernate.cache.use_second_level_cache">true</prop>
<prop key="hibernate.cache.use_query_cache">true</prop>
<prop key="hibernate.cache.region.factory_class">org.hibernate.cache.ehcache.EhCacheRegionFactory</prop>
<prop key="net.sf.ehcache.configurationResourceName">/your-cache-config.xml</prop>

Prefer using EhCacheRegionFactory instead of SingletonEhCacheRegionFactory. Using EhCacheRegionFactory means that Hibernate will create separate cache regions for Hibernate caching, instead of trying to reuse cache regions defined elsewhere in the application.

The next step is to configure the cache regions settings, in file your-cache-config.xml:

<?xml version="1.0" ?>
<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             updateCheck="false"
       xsi:noNamespaceSchemaLocation="ehcache.xsd" name="yourCacheManager">

     <diskStore path="java.io.tmpdir"/>

     <cache name="yourEntityCache"
            maxEntriesLocalHeap="10000"
            eternal="false"
            overflowToDisk="false"
            timeToLiveSeconds="86400" />

     <cache name="org.hibernate.cache.internal.StandardQueryCache"
            maxElementsInMemory="10000"
            eternal="false
            timeToLiveSeconds="86400"
            overflowToDisk="false"
            memoryStoreEvictionPolicy="LRU" />

  <defaultCache
          maxElementsInMemory="10000"
          eternal="false"
          timeToLiveSeconds="86400"
          overflowToDisk="false"
          memoryStoreEvictionPolicy="LRU" />
</ehcache>

If no cache settings are specified, default settings are taken, but this is probably best avoided. Make sure to give the cache a name by filling in the name attribute in the ehcache element.

Giving the cache a name prevents it from using the default name, which might already be used somewhere else on the application.

Using the second level cache

The second level cache is now ready to be used. In order to cache entities, annotate them with the @org.hibernate.annotations.Cache annotation:

@Entity       
@Cache(usage=CacheConcurrencyStrategy.READ_ONLY, 
     region="yourEntityCache")
public class SomeEntity {
    ...
}

Associations can also be cached by the second level cache, but by default this is not done. In order to enable caching of an association, we need to apply @Cache to the association itself:

@Entity       
public class SomeEntity {
    @OneToMany
    @Cache(usage=CacheConcurrencyStrategy.READ_ONLY,
        region="yourCollectionRegion")
     private Set<OtherEntity> other;     
}

Using the query cache

After configuring the query cache, by default no queries are cached yet. Queries need to be marked as cached explicitly, this is for example how a named query can be marked as cached:

@NamedQuery(name="account.queryName",
   query="select acct from Account ...",
   hints={
       @QueryHint(name="org.hibernate.cacheable",
       value="true")
   }     
})

And this is how to mark a criteria query as cached:

List cats = session.createCriteria(Cat.class)
    .setCacheable(true)
    .list();

The next section goes over some pitfalls that you might run into while trying to setup these two caches. These are behaviors that work as designed but still can be surprising.

Pitfall 1 – Query cache worsens performance causing a high volume of queries

There is an harmful side-effect of how the two caches work, that occurs if the cached query results are configured to expire more frequently than the cached entities returned by the query.

If a query has cached results, it returns a list of entity Id’s, that is then resolved against the second level cache. If the entities with those Ids where not configured as cacheable or if they have expired, then a select will hit the database per entity Id.

For example if a cached query returned 1000 entity Ids, and non of those entities where cached in the second level cache, then 1000 selects by Id will be issued against the database.

The solution to this problem is to configure query results expiration to be aligned with the expiration of the entities returned by the query.

Pitfall 2 – Cache limitations when used in conjunction with @Inheritance

It is currently not possible to specify different caching policies for different subclasses of the same parent entity.

For example this will not work:

@Entity
@Inheritance
@Cache(CacheConcurrencyStrategy.READ_ONLY)
public class BaseEntity {
    ...
}

@Entity
@Cache(CacheConcurrencyStrategy.READ_WRITE)
public class SomeReadWriteEntity extends BaseEntity {
    ...
}

@Entity
@Cache(CacheConcurrencyStrategy.TRANSACTIONAL)
public class SomeTransactionalEntity extends BaseEntity {
    ...
}

In this case only the @Cache annotation of the parent class is considered, and all concrete entities have READ_ONLY concurrency strategy.

Pitfall 3 – Cache settings get ignored when using a singleton based cache

It is advised to configure the cache region factory as a EhCacheRegionFactory, and specify an ehcache configuration via net.sf.ehcache.configurationResourceName.

There is an alternative to this region factory which is SingletonEhCacheRegionFactory. With this region factory the cache regions are stored in a singleton using the cache name as a lookup key.

The problem with the singleton region factory is that if another part of the application had already registered a cache with the default name in the singleton, this causes the ehcache configuration file passed via net.sf.ehcache.configurationResourceName to be ignored.

Conclusion

The second level and query caches are very useful if set up correctly, but there are some pitfalls to bear in mind in order to avoid unexpected behaviors. All in all it’s a feature that works transparently and that if well used can increase significantly the performance of an application.

Aleksey Novik

Software developer, Likes to learn new technologies, hang out on stackoverflow and blog on tips and tricks on Java/Javascript polyglot enterprise development.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button