JPA 2 | Fetch Joins and whether we should use them

Anuj KumarJuly 18th, 2013Last Updated: July 17th, 2013

4 191 6 minutes read

Introduction

Recently I have been working with FETCH JOINS in JPA 2 to fetch the data eagerly from the Database and I learned quite a lot on why we should avoid using Fetch Joins in our day to day operations.

Today’s blog post talks about my experience with Fetch and my learning (based mostly on the review comments I got when I had lots of FETCH JOINS in my query).

Problem Statement

There was a situation in our project where an Entity was being fetched from the Database that had many Collection Valued Associations defined ( OneToMany, ManyToMany , also referred as ToMany association). Here is a conceptual picture of how the entity looked like(getters and setters are omitted for clarity purpose) . This is an extremely simplified example of the Entity. In the real scenario we had around 11 associations.

public class ItemRecord {

    @Id
    private String itemId;
    
    @Column(name="author")
    private String author;

    @OneToMany(cascade = CascadeType.ALL, mappedBy = "item")
    private List costs;

   @OneToMany(cascade = CascadeType.ALL, mappedBy = "item")
    private List notes;

    @OneToMany(cascade = CascadeType.ALL, mappedBy = "item")
    private List stats;

}

There are few things to notice about the above Entity:

It has 3 Collection Valued Associations.
All these associations are being fetched lazily as the default fetch strategy for Collection Valued Association in JPA is Lazy.

In our business implementation, we have a translator that takes the value returned by our DAO layer and convert it into a Business DTO.

The algorithm of our business method was thus as follows :

@TransactionAttribute
public List searchItemRecords (SearchCriteria sc) {
           
             List ir = itemRecordDao.search(sc);
             List convertedData = recordConverter.convert(ir);
             return convertedData;
}

Note that the whole method is running inside a Transaction.

Whenever we fetched the data from the database, none of the related data for cost, statistic etc was fetched eagerly. But our ItemInformation DTO expected all the data. Therefore when getCosts or getStatistics was called for the first time, a query was fired to the database by the persistence Provider (Hibernate in our case) to get the specified data. This was creating an N + 1 Select query problem for us. If you are unfamiliar with N+1 Selects or need some refreshing, you can look at this article on DZone.

Solution

Most of us, including me, would choose the fastest and easiest solution to N+1 select problem, which is to use a Fetch Join. This is also suggested a lot in different blogs/articles spread on the internet.

So I also took the same approach. And it was a BAD approach, in my case atleast.

Lets see first how we can use FETCH JOIN.

The query before using Fetch Join was as follows:

SELECT item FROM ItemRecord item WHERE author=:author;

Note the query is in JPA form.

This query did not fetch the collection values. As a result, in translator, when we did getCosts for every ItemRecord, a query similar to below was fired :

SELECT cost FROM Cost where itemId = :itemId

Thus, if we had 3 ItemRecords, then the total Number of SELECT queries fired to the Database are :

1 for fetching all the ItemRecords
3 for fetching Costs for each ItemRecord
3 for fetching Notes for each ItemRecord
3 for fetching Statistics for each ItemRecord

i.e. (3 Times 3) + 1

which when generalised translates to N times M + 1 ,

where,

N is the number of Primary Entity records found
M is the number of Collection valued associations in the Primary Entity
1 query for fetching all the Primary Entities

In our real scenario, we had 11 associations. Thus for each Primary ItemRecord Entity, we were firing 11 SELECT queries. The number of queries fired multiplied for each ItemRecord found.

Making the collection valued associations EAGER was not the option as there were a lot of other queries being run on the Entity that required only selected data.

This is not an optimal solution. Something had to be done about it and a lot of internet articles suggested using FETCH JOINS as they were easiest to implement and solved the (N Time M + 1) query problem.

So I decided to use FETCH Joins in my query to fetch all the data eagerly for the given scenario.

The query looked similar to :

SELECT item FROM ItemRecord

JOIN FETCH item.costs,
JOIN FETCH item.notes,
JOIN FETCHitem.stats;

Hurdle 1

I realised pretty quickly that this query will not work in my scenario as I can have an ItemRecord that does not have any statistics associated with it and in such a case the above query will not return me that ItemRecord( because of the way ORM works) and therefore I will have incomplete data.

So, I next moved to a LEFT JOIN FETCH that will give return me ItemRecord entities even when some of the associated relationships are empty. The query looked like this :

SELECT item FROM ItemRecord

LEFT JOIN FETCH item.costs,
LEFT JOIN FETCH item.notes,
LEFT JOIN FETCH item.stats;

Hurdle 2

When I ran my Unit Tests to test the above query, I got the exception :

javax.persistence.PersistenceException: org.hibernate.HibernateException: cannot simultaneously fetch multiple bags

What is the problem?

The problem is that I am using List as collection type in my Entity. Using List confuses JPA/Hibernate. This confusion is nicely documented in this article.

In order to work around this problem, I chose to use Set instead of List, mainly because it was the easiest of the three solutions provided by the above blog post and it also made sense (atleast while I was implementing.)

So I changed my Entity to have Set as collection instead of List and the modified entity looked like this:

public class ItemRecord {

    @Id
    private String itemId;
    
    @Column(name="author")
    private String author;

    @OneToMany(cascade = CascadeType.ALL, mappedBy = "item")
    private Set costs;

   @OneToMany(cascade = CascadeType.ALL, mappedBy = "item")
    private Set notes;

    @OneToMany(cascade = CascadeType.ALL, mappedBy = "item")
    private Set stats;

}

I ran my test cases again and my test case that was testing the query succeeded. Yippie. BUT, my another test case that was testing the entries in the notes section failed. Looking at the test case, I realised that I needed the data in a specific order, in the order in which it was entered and I was using HashSet which is not ordered. The solution was simple. Use LinkedHashSet which maintains the order of the elements.

Using LinkedHashSet did the trick and my test case passed.

I was very happy but my happiness was short lived.

Hurdle 3

I had another test case that expected 3 cost objects for a given ItemRecord. As soon as I moved to Set implemenation, the test started failing. Turned out I had incorrect hashCode and equals implementation for my Cost Entity which was returning same hashcode for two different entities and as a result only one entity was ever persisted as Set does not allow duplicate values.

So, the next thing for me to do was have proper HashCode and Equals implementation for all my Entities.

Final Hurdle

Finally, when all my test cases started passing, I made a code review and sent it out to the team.

The first to freak out was my Tech Lead. :)

He was simply furious to look at the FETCH JOINS. Reason? Reason was that LEFT FETCH JOINs return a cartesian product of ALL the data. With the amount of data that we have in production, it would become a nightmare to even support multiple selects on the ItemRecord. The whole problem can be easily understood on this blog post.

Thus I was trying to solve a performance problem and it turned out I was actually creating a bigger performance issue. :)

The whole solution of moving to FETCH JOIN was dropped and it was decided to investigate further why we need the entire data on the UI and why can’t we divide fetching the data into smaller dedicated transaction.