Enterprise Java

EAGER fetching is a code smell

Introduction

Hibernate fetching strategies can really make a difference between an application that barely crawls and a highly responsive one. In this post I’ll explain why you should prefer query based fetching instead of global fetch plans.

Fetching 101

Hibernate defines four association retrieving strategies:
 
 

Fetching StrategyDescription
JoinThe association is OUTER JOINED in the original SELECT statement
SelectAn additional SELECT statement is used to retrieve the associated entity(entities)
SubselectAn additional SELECT statement is used to retrieve the whole associated collection. This mode is meant for to-many associations
BatchAn additional number of SELECT statements is used to retrieve the whole associated collection. Each additional SELECT will retrieve a fixed number of associated entities. This mode is meant for to-many associations

 
These fetching strategies might be applied in the following scenarios:

  • the association is always initialized along with its owner (e.g. EAGER FetchType)
  • the uninitialized association (e.g. LAZY FetchType) is navigated, therefore the association must be retrieved with a secondary SELECT

The Hibernate mappings fetching information forms the global fetch plan. At query time, we may override the global fetch plan, but only for LAZY associations. For this we can use the fetch HQL/JPQL/Criteria directive. EAGER associations cannot be overridden, therefore tying your application to the global fetch plan.

Hibernate 3 acknowledged that LAZY should be the default association fetching strategy:

By default, Hibernate3 uses lazy select fetching for collections and lazy proxy fetching for single-valued associations. These defaults make sense for most associations in the majority of applications.

This decision was taken after noticing many performance issues associated with Hibernate 2 default eager fetching. Unfortunately JPA has taken a different approach and decided that to-many associations be LAZY while to-one relationships be fetched eagerly.

Association typeDefault fetching policy
@OneTManyLAZY
@ManyToManyLAZY
@ManyToOneEAGER
@OneToOneEAGER

 

EAGER fetching inconsistencies

While it may be convenient to just mark associations as EAGER, delegating the fetching responsibility to Hibernate, it’s advisable to resort to query based fetch plans.

An EAGER association will always be fetched and the fetching strategy is not consistent across all querying techniques.

Next, I’m going to demonstrate how EAGER fetching behaves for all Hibernate querying variants. I will reuse the same entity model I’ve previously introduced in my fetching strategies article:

product2

The Product entity has the following associations:

@ManyToOne(fetch = FetchType.EAGER)
@JoinColumn(name = "company_id", nullable = false)
private Company company;

@OneToOne(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy = "product", optional = false)
private WarehouseProductInfo warehouseProductInfo;

@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "importer_id")
private Importer importer;

@OneToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL, mappedBy = "product", orphanRemoval = true)
@OrderBy("index")
private Set<Image> images = new LinkedHashSet<Image>();

The company association is marked as EAGER and Hibernate will always employ a fetching strategy to initialize it along with its owner entity.

Persistence Context loading

First we’ll load the entity using the Persistence Context API:

Product product = entityManager.find(Product.class, productId);

Which generates the following SQL SELECT statement:

Query:{[
select 
    product0_.id as id1_18_1_, 
    product0_.code as code2_18_1_, 
    product0_.company_id as company_6_18_1_, 
    product0_.importer_id as importer7_18_1_, 
    product0_.name as name3_18_1_, 
    product0_.quantity as quantity4_18_1_, 
    product0_.version as version5_18_1_, 
    company1_.id as id1_6_0_, 
    company1_.name as name2_6_0_ 
from Product product0_ 
inner join Company company1_ on product0_.company_id=company1_.id 
where product0_.id=?][1]

The EAGER company association was retrieved using an inner join. For M such associations the owner entity table is going to be joined M times.

Each extra join adds up to the overall query complexity and execution time. If we don’t even use all these associations, for every possible business scenario, then we’ve just paid the extra performance penalty for nothing in return.

Fetching using JPQL and Criteria

Product product = entityManager.createQuery(
	"select p " +
			"from Product p " +
			"where p.id = :productId", Product.class)
	.setParameter("productId", productId)
	.getSingleResult();

or with

CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<Product> cq = cb.createQuery(Product.class);
Root<Product> productRoot = cq.from(Product.class);
cq.where(cb.equal(productRoot.get("id"), productId));
Product product = entityManager.createQuery(cq).getSingleResult();

Generates the following SQL SELECT statements:

Query:{[
select 
    product0_.id as id1_18_, 
    product0_.code as code2_18_, 
    product0_.company_id as company_6_18_, 
    product0_.importer_id as importer7_18_, 
    product0_.name as name3_18_, 
    product0_.quantity as quantity4_18_, 
    product0_.version as version5_18_ 
from Product product0_ 
where product0_.id=?][1]} 

Query:{[
select 
    company0_.id as id1_6_0_, 
    company0_.name as name2_6_0_ 
from Company company0_ 
where company0_.id=?][1]}

Both JPQL and Criteria queries default to select fetching, therefore issuing a secondary select for each individual EAGER association. The larger the associations number, the more additional individual SELECTS, the more it will affect our application performance.

Hibernate Criteria API

While JPA 2.0 added support for Criteria queries, Hibernate has long been offering a specific dynamic query implementation.

If the EntityManager implementation delegates method calls the the legacy Session API, the JPA Criteria implementation was written from scratch. That’s the reason why Hibernate and JPA Criteria API behave differently for similar querying scenarios.

The previous example Hibernate Criteria equivalent looks like this:

Product product = (Product) session.createCriteria(Product.class)
	.add(Restrictions.eq("id", productId))
	.uniqueResult();

And the associated SQL SELECT is:

Query:{[
select 
    this_.id as id1_3_1_, 
    this_.code as code2_3_1_, 
    this_.company_id as company_6_3_1_, 
    this_.importer_id as importer7_3_1_, 
    this_.name as name3_3_1_, 
    this_.quantity as quantity4_3_1_, 
    this_.version as version5_3_1_, 
    hibernatea2_.id as id1_0_0_, 
    hibernatea2_.name as name2_0_0_ 
from Product this_ 
inner join Company hibernatea2_ on this_.company_id=hibernatea2_.id 
where this_.id=?][1]}

This query uses the join fetch strategy as opposed to select fetching, employed by JPQL/HQL and Criteria API.

Hibernate Criteria and to-many EAGER collections

Let’s see what happens when the image collection fetching strategy is set to EAGER:

@OneToMany(fetch = FetchType.EAGER, cascade = CascadeType.ALL, mappedBy = "product", orphanRemoval = true)
@OrderBy("index")
private Set<Image> images = new LinkedHashSet<Image>();

The following SQL is going to be generated:

Query:{[
select 
    this_.id as id1_3_2_, 
    this_.code as code2_3_2_, 
    this_.company_id as company_6_3_2_, 
    this_.importer_id as importer7_3_2_, 
    this_.name as name3_3_2_, 
    this_.quantity as quantity4_3_2_, 
    this_.version as version5_3_2_, 
    hibernatea2_.id as id1_0_0_, 
    hibernatea2_.name as name2_0_0_, 
    images3_.product_id as product_4_3_4_, 
    images3_.id as id1_1_4_, 
    images3_.id as id1_1_1_, 
    images3_.index as index2_1_1_, 
    images3_.name as name3_1_1_, 
    images3_.product_id as product_4_1_1_ 
from Product this_ 
inner join Company hibernatea2_ on this_.company_id=hibernatea2_.id 
left outer join Image images3_ on this_.id=images3_.product_id 
where this_.id=? 
order by images3_.index][1]}

Hibernate Criteria doesn’t automatically groups the parent entities list. Because of the one-to-many children table JOIN, for each child entity we are going to get a new parent entity object reference (all pointing to the same object in our current Persistence Context):

product.setName("TV");
product.setCompany(company);

Image frontImage = new Image();
frontImage.setName("front image");
frontImage.setIndex(0);

Image sideImage = new Image();
sideImage.setName("side image");
sideImage.setIndex(1);

product.addImage(frontImage);
product.addImage(sideImage);

List products = session.createCriteria(Product.class)
	.add(Restrictions.eq("id", productId))
	.list();
assertEquals(2, products.size());
assertSame(products.get(0), products.get(1));

Because we have two image entities, we will get two Product entity references, both pointing to the same first level cache entry.

To fix it we need to instruct Hibernate Criteria to use distinct root entities:

List products = session.createCriteria(Product.class)
	.add(Restrictions.eq("id", productId))
	.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY)
	.list();
assertEquals(1, products.size());

Conclusion

The EAGER fetching strategy is a code smell. Most often it’s used for simplicity sake without considering the long-term performance penalties. The fetching strategy should never be the entity mapping responsibility. Each business use case has different entity load requirements and therefore the fetching strategy should be delegated to each individual query.

The global fetch plan should only define LAZY associations, which are fetched on a per query basis. Combined with the always check generated queries strategy, the query based fetch plans can improve application performance and reduce maintaining costs.

Reference: EAGER fetching is a code smell from our JCG partner Vlad Mihalcea at the Vlad Mihalcea’s Blog blog.

Vlad Mihalcea

Vlad Mihalcea is a software architect passionate about software integration, high scalability and concurrency challenges.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

24 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
acichon89
acichon89
9 years ago

What about @OneToOne ?

Yannick Majoros
Yannick Majoros
9 years ago

You’re jumping to the wrong conclusion. > If we don’t even use all these associations, for every possible business scenario, then we’ve just paid the extra performance penalty for nothing in return. What if you do use them all the time? You give a single example and make it an universal law. Also, focusing on hibernate instead of JPA is the best way to make non-portable applications, by definition. You should know how your JPA implementation executes queries, but you shouldn’t take that for granted. What if some relation is a performance problem to begin with? Many 1-N queries shouldn’t… Read more »

Vlad Mihalcea
9 years ago

How many applications did you port between Hibernate and EclipseLink? In reality, this argument is useless because this scenario is improbable. You need specific features that are way beyond the JPA standard.

Using query-time fetch plans is appropriate. The other arguments aren’t even addressing this article so I’ll just leave them be.

Yannick Majoros
Yannick Majoros
9 years ago

I did a couple of times.

I have a couple of applications that will need to be ported from Eclipselink to Hibernate (mainly between app servers). I’m glad they weren’t Eclipselink-specific.

This is just the main goal of JPA.

The other main argument is that it’s quite strange to take a single example to jump to the (wrong) conclusion that eager relations are a code smell. It’s quite related to your article.

Sorry for not being positive, but there is no purpose in a comment section if errors in your article can’t even be discussed.

Yannick Majoros
Yannick Majoros
9 years ago

I did a couple of times.

I have a couple of applications that will need to be ported from Eclipselink to Hibernate (mainly between app servers). I’m glad they weren’t Eclipselink-specific.

This is just the main goal of JPA.

The other main argument is that it’s quite strange to take a single example to jump to the (wrong) conclusion that eager relations are a code smell. It’s quite related to your article.

Sorry for not being positive, but there is no purpose in a comment section if we can’t disagree.

Vlad Mihalcea
9 years ago

I think it’s worth writing about this transition, because now you have knowledge of both EclipseLink and Hibernate. Hibernate is much more common, taking two thirds of the whole JPA market share: http://stackoverflow.com/questions/26672322/evaluating-jpa-providers-market-share-of-hibernate-eclipselink I can’t tell you how EAGER is implemented in EclipseLink, but I am sure it’s either implemented with a secondary select or with a JOIN. Even if you can override the fetch strategy for EAGER associations, it’s still better to define the fetch plans on a query-basis. Do you agree on this one? Also, please provide me a real-life example where EAGEr fetching is much more suitable… Read more »

Yannick Majoros
Yannick Majoros
9 years ago

I’d rather write about coding to the standard (JPA) instead of any implementation. But it’s been done a lot, so I’m not sure if it’s worth it. Eager will default to a second query in EclipseLink, and you can override it with (proprietary) annotations or custom mapping. At query level, you can specify it with standard JPA for both JPQL and JPA CriteriaQuery. In a lot of cases, it makes sense to *not* fetch the relation in the same query: your second-level cache could make dramatic improvements there and avoid most second queries anyway. If you know you’ll (most) always… Read more »

Yannick Majoros
Yannick Majoros
9 years ago

BTW, java specs are quite often, and the same is true for JPA. If you feel that the default fetch type is wrong, you could discuss it on the specification users mailing list.

https://java.net/projects/jpa-spec/lists/users/archive

Yannick Majoros
Yannick Majoros
9 years ago

BTW, java specs are quite open, and the same is true for JPA. If you feel that the default fetch type is wrong, you could discuss it on the specification users mailing list.
https://java.net/projects/jpa-spec/lists/users/archive

Vlad Mihalcea
9 years ago

Caring about JPA instead of any implementation is like assuming JEE is enough when managing/deploying one an actual JEE application server. You need to know both the standard and the implementation details anyway.

Your example with the second level cache is exactly what LAZY brings you, not EAGER. If you want to join you simply use the join directive. If you want to fetch from 2nd level cache, you mark the to-one association as LAZY and let the cache fetch it for you.

The more you try to convince me that EAGER is fine, the more you prove its weaknesses.

Yannick Majoros
Yannick Majoros
9 years ago

Yes, that’s exactly what Java EE is: a spec which hides implementations details. And that’s what I like in it. It’s not just another framework, it has many implementations and you (really) can switch. It does have a steep learning curve, but it’s worth it. I’ve been coding Java EE apps for years, and I mostly don’t have to care for the server implementation. Be it EJB, CDI, JAXB, JAX-RS, JPA or whatever, it’s basically the same stuff between implementations. I’m a freelancer, and it’s quite easy for me to start at new customer and get an insight of their… Read more »

Vlad Mihalcea
9 years ago

JEE has come a long way but it still doesn’t convince me to trade the Spring suite for going the “standard” way. Standards are a means not a goal. J2EE was a standard too. JDO was also a standard. You say you can only rely on JPA in all your projects, but let me know how do you achieve stuff like these ones: – a sequence generator with hi/lo optimizer that can inter operate with other database clients that don’t know about such an optimization (http://vladmihalcea.com/2014/07/21/hibernate-hidden-gem-the-pooled-lo-optimizer/) – version-less optimistic locking for legacy schemas that don’t support adding version columns (vladmihalcea.com/2014/12/08/the-downside-of-version-less-optimistic-locking/)… Read more »

Yannick Majoros
Yannick Majoros
9 years ago

Well, let’s not compare penis sizes. I still disagree. You like Spring, I like standards, I guess that our point is clear. I still think that it’s better to make portable, JPA, applications. There is no reason for the basic, common stuff to be Hibernate-specific. Let’s also avoid corner cases or the legacy unmaintanable schemas, and focus on normal, forward-engineering jpa use. We do agree on some things: if you know you don’t need eager relation fetch type, just don’t use it. Could you back up your point about the cause for eager as default because some jpa implementations didn’t… Read more »

Vlad Mihalcea
9 years ago

I’ll quote the Java Persistence Wiki book (http://en.wikibooks.org/wiki/Java_Persistence/Relationships): “In JPA lazy fetching can be set on any relationship using the fetch attribute. The fetch can be set to either LAZY or EAGER as defined in the FetchType enum. The default fetch type is LAZY for all relationships except for OneToOne and ManyToOne, but in general it is a good idea to make every relationship LAZY. The EAGER default for OneToOne and ManyToOne is for implementation reasons (more difficult to implement), not because it is a good idea. Technically in JPA LAZY is just a hint, and a JPA provider is… Read more »

Yannick Majoros
Yannick Majoros
9 years ago

Your link is no authoritative reference. I don’t know where they got it from, and even think they made up some parts. Hell, anyone can edit that thing and there is no review process. That’s a good think for quick info, but I wouldn’t consider that as a reliable source. Regarding those questions: > 1. How is the default EAGER fetch plan more suitable than a query-time fetch plan based? I didn’t say that. I did say that eager makes some sense, and that you shouldn’t use it if you don’t think it does in your situation. > 2. Is… Read more »

Vlad Mihalcea
9 years ago

1. Well, you can say it 1000 times that sometimes it makes sense to use EGAER. Unless you provide the exact use case and context where it actually makes sense, it’s just politician talking.

2. As for moving this discussion to the JPA mailing list, feel free to copy-paste it and continue it there, if you still have doubts about my conclusion. I am convinced that EAGER is a performance drain-hole and I’ve successfully been applying the LAZY+query-fetch-plan rule on multiple projects.

Yannick Majoros
Yannick Majoros
9 years ago

I did provide enough examples, please re-read my comments. The product-company example illustrates a case where performance cost of a eager relation tends to 0 (caching), with obvious benefits (entity is fully loaded and company can be accessed portably outside of persistence context).

Just stating 1000 times that eager = evil won’t make it so. This is just wrong and is easy enough to google away.

http://lmgtfy.com/?q=jpa+eager+default

Vlad Mihalcea
9 years ago

” A Product entity with a ManyToOne eager relation to a Company and a L2 cache will result in a performance class for your code.” So that’s your best argument for using EAGER? Changing this association to LAZY will give you the same behavior and you might not even require the association every time, translating to better performance (saving the extra cache retrieval calls when you don’t even need those). But for middle-to-large application you probably have 50 entities with 5 ManyToOne associations, all of which are EAGER. You have to cache them all to avoid the extra SELECT, which… Read more »

Yannick Majoros
Yannick Majoros
9 years ago

No, you can’t achieve the same with lazy relationships. There is another point missing in your article. With a lazy fetch type (which is just a hint, btw), the spec clearly states that the persistence provider isn’t required to make the relation available when you leave the persistence context. LazyInitializationException in Hibernate (I guess there are workarounds, but they aren’t more than that), even null relations in OpenJPA. EclipseLink will do just fine in a Java EE context, but those are implementation details, not guaranteed by the spec. I never said that eager is superior to lazy in any way,… Read more »

Yannick Majoros
Yannick Majoros
9 years ago

This is basically the most prominent error in this article:

> The larger the associations number, the more additional individual SELECTS, the more it will affect our application performance.

If eager relationships are well-thought and thus there for a reason, you would either get the selects anyway, or hit the cache most of the time. Cache retrieval times vs sql will be in the order or 1:10000. There are a lot of situations where the additional complexity created by adhoc queries when you need the relation outside of any persistence context, will just not be worth it.

Back to top button