Hibernate mapped collections performance problems

Byron KiourtzoglouFebruary 2nd, 2011Last Updated: October 21st, 2012

2 622 8 minutes read

First things first, this article was inspired after Burt Beckwith‘s presentation about Advanced GORM – Performance, Customization and Monitoring at SpringOne 2GX on Jan 27, 2011. In short, Burt Beckwith discusses potential performance problems using mapped collections and Hibernate 2nd-level cache in GORM, along with strategies for avoiding such performance penalties.

Nevertheless the performance issues regarding mapped collections that Burt Beckwith pinpoints in his presentation apply for every Hibernate enabled application in general. That’s why after watching his presentation I came to realize that what he proposes is exactly what I myself have been doing, and dictated my colleagues to do when developing using mapped collections in Hibernate.

Below are the 5 things to consider when working with Hibernate mapped collections :

Lets consider the following classic “Library – Visit” example :

The following Library class has a collection of Visit instances:

package eg;
import java.util.Set;

public class Library {
    private long id;
    private Set visits;

    public long getId() { return id; }
    private void setId(long id) { this.id=id; }

    private Set getVisits() { return visits; }
    private void setVisits(Set visits) { this.visits=visits; }

    ....
    ....
}

Following is the Visit class:

package eg;
import java.util.Set;

public class Visit {
    private long id;
    private String personName;

    public long getId() { return id; }
    private void setId(long id) { this.id=id; }

    private String getPersonName() { return personName; }
    private void setPersonName(String personName) { this.personName=personName; }

    ....
    ....
}

Assuming that a library has multiple unique visits and that every visit correlates with a distinct library, a unidirectional one-to-many association like the one shown below can be used:

<hibernate-mapping>

    <class name="Library">
        <id name="id">
            <generator class="sequence"/>
        </id>
        <set name="visits">
            <key column="library_id" not-null="true"/>
            <one-to-many class="Visit"/>
        </set>
    </class>

    <class name="Visit">
        <id name="id">
            <generator class="sequence"/>
        </id>
        <property name="personName"/>
    </class>

</hibernate-mapping>

I will also provide an example of Table definitions for the schema described above :

create table library (id bigint not null primary key )
create table visit(id bigint not null
                     primary key,
                     personName varchar(255),
                     library_id bigint not null)
alter table visit add constraint visitfk0 (library_id) references library

So what’s wrong with this picture?

Potential performance bottlenecks will arise when you will try to add to the mapped collection. As you can see the collection is implemented as a Set. Sets guarantee uniqueness among their contained elements. So how Hibernate will know that a new item is unique so as to add it to the Set? Well do not be surprised; adding to the Set requires loading all available items from the database. Hibernate compares each and every one of them with the new one just to guarantee uniqueness. Moreover the above is standard behavior that we cannot bypass even if we know, because of business rules, that the new item is unique!

Using a List implementation for our mapped collection will not solve the performance bottleneck problem when adding items to it either. Although Lists do not guarantee uniqueness they do guarantee item order. So to maintain the correct item order in our mapped List, Hibernate has to pull the entire collection even if we are adding to the end of the list.

To my opinion, its a long way just to add one new Visit to the Library don’t you agree?

Additionally, the above example works well in development where we only have a few number of visits. In production environments where each library may have millions of visits, just imagine the performance penalty when you try to add one more!

To overcome the above performance problems we could map the collection as a Bag, which is just a regular collection with no ordering or uniqueness guarantees, but before doing so just consider my last point below.

When you remove/add an object from/to a collection, the version number of the collection owner is incremented. Thus there is a high risk of artificial optimistic locking exceptions on the Library object when simultaneous Visit creations occur. We characterize the optimistic locking exceptions as “artificial” because they happen on the collection owner object (Library) which we do not feel we are editing (but we are!) when we are adding/removing an item from the Visits collection.

Let me pinpoint that the same rules apply for a many-to-many association type.

So what’s the solution?

The solution is simple, remove the mapped collection from the owner (Library) object, and perform insertions and deletions for the Visit items “manually”. The proposed solution affects usage in the following ways :

To add a Visit to a Library we must create a new Visit item, associate it with a Library item and persist it in the database explicitly.
To remove a Visit from a Library we must search the “visit” table, find the exact record we need and delete it.
With the proposed solution, no cascading is supported. To delete a Library you need to delete (disassociate) all its Visit records first.

To keep things clean and ordered you can restore a “visits” pseudo – collection back to the Library object by implementing a helper method that will query the database and return all Visit objects associated with the specific Library. Furthermore you can implement a couple of helper methods that will perform the actual insertion and deletion of visit records, at the Visit item.

Below we provide updated versions of the Library class, the Visit class and the Hibernate mapping so as to comply to our proposed solution :

First the updated Library class :

package eg;
import java.util.Set;

public class Library {
    private long id;

    public long getId() { return id; }
    private void setId(long id) { this.id=id; }

    public Set getVisits() { 
      // TODO : return select * from visit where visit.library_id=this.id
    }
    ....
    ....
}

As you can see, we have removed the mapped collection and introduced the method “getVisits()” that should be used to return all the Visit items for the specific Library instance (the TODO comment is in pseudo-code).

Following is the updated Visit class:

package eg;
import java.util.Set;

public class Visit {
    private long id;
    private String personName;
    private long library_id;

    public long getId() { return id; }
    private void setId(long id) { this.id=id; }

    private String getPersonName() { return personName; }
    private void setPersonName(String personName) { this.personName=personName; }

    private long getLibrary_id() { return library_id; }
    private void setLibrary_id(long library_id) { this. library_id =library_id; }

    ....
    ....
}

As you can see we have added the “library_id” field to the Visit object so as to be able to correlate it with a Library item.

Last is the updated Hibernate mapping :

<hibernate-mapping>

    <class name="Library">
        <id name="id">
            <generator class="sequence"/>
        </id>
    </class>

    <class name="Visit">
        <id name="id">
            <generator class="sequence"/>
        </id>
        <property name="personName"/>
        <property name="library_id"/>
    </class>

</hibernate-mapping>

So, never use mapped collections in Hibernate?

Well, to be honest, No. You need to examine each case so as to decide what to do. The standard approach is fine if the collections are reasonable small – both sides in the case of a many to many association scheme. Additionally the collections will contain proxies, so they are smaller than real instances until initialized.

Happy Coding! Don’t forget to share!

Justin

P.S.

After a relatively long debate about this article on TheServerSide, one or our readers Eb Bras provided a useful list of Hibernate “tips and tricks”, lets see what he has to say :

Here are a few of my Hibernate tips and tricks that I documented a long the way:

inverse=”true”
Use this as much as possible in a one-to-many parent-child association (to another entity or value-type that is used as an entity).
This property is set on the collection tag like “set” and mean that the many-to-one owns the association and is responsible for all db inserts/updates/deletes. It makes the association part of the child.
It will save an db update for the foreign key as it will occur directly when inserting the child.

Especially when using “set” as mapping type, it can gains performance as the child don’t need to be added to the parent collection which can save the loading of the complete collection. That is: due to the nature of a set-mapping, the whole collection must always be loaded when adding a new child as that’s the only way hibernate can guarantee that the new entry isn’t a duplicate which is a feature of the JRE Set interface.
In case it concerns a component collection (= collection containing only pure value types), inverse=true is ignored and makes no sense as Hibernate has full control of the objects and will choose the best way to perform his crud actions.
If it concern detached DTO objects (not containing any hibernate objects), hibernate will delete all value-type child’s and then insert them as it doesn’t know which object is new or existent because it was completely detached. Hibernate treats it as it is a new collection.

lazy Set.getChilds() is evil
Be careful using getChilds() that returns a Set and will lazy load all child’s.
Don’t use this when you want to add or remove just a child as it will first

always implement equals/hashcode
Make sure to always implement the equals/hashcode for every object that is managed by Hibernate, even if it doesn’t seem important. This counts also for Value type objects.
If the object doesn’t contain properties that are candidates for the equals/hashcode, use a surrogate key, that consists of a UUID for example.
Hibernate uses the equals/hashcode to find out if an object is already present in the db. If it concerns an existing object but Hibernate thinks that it’s a new object because the equals/hashcode isn’t implemented correctly, Hibernate will perform an insert and possible a delete of the old value.
Especially for value types in Set’s is important and must be tested as it saves db traffic.
The idea: you are giving Hibernate more knowledge such it can use it to optimize his actions.

use of version
Always use the version property with an entity or a value type that is used as an entity.
This results in less db traffic as Hibernate uses this information to discover if it concerns a new or existing object. If this property isn’t present, it will have to hit the db to find out if it concerns a new or existing object.

eager fetching
Not-lazy collections (child’s) are by default loaded through an extra select query that is just performed just after the parent is loaded from the db.
The child’s can be loaded in the same query as loading the parent by enabling eager fetching which is done by setting the attribute “fetch=join” on the collection mapping tag. If enabled, the child’s are loaded through a left outer join.
Test if this improves the performance. In case many join’s occur or if it concerns a table with many columns the performance will get worse instead of better.

use surrogate key in value type child object
Hibernate will construct the primary key in a value-type child of a parent-child relation that consists of all not-null columns. This can lead to strange primary key combinations, especially when a date column is involved. A date column shouldn’t be part of a primary key as it’s millisecond part will result to primary key’s that are almost never the same. This results in strange and probably poor performance db indexes.
To improve this we use a surrogate key in all child value-type objects, that is the only not-null property. Hibernate will then construct a primary key that consists of the foreign key and surrogate key, which is logic and performs well.
Note that the surrogate key is only used for database optimization and it’s not required to be used in the equals/hashcode which should consists of business logic if possible.

Related Articles :

Hibernate mapped collections performance problems

Thank you!

Byron Kiourtzoglou

Thank you!