Hibernate hidden gem: the pooled-lo optimizer

Vlad MihalceaJuly 28th, 2014Last Updated: July 27th, 2014

0 994 3 minutes read

Introduction

In this post we’ll uncover a sequence identifier generator combining identifier assignment efficiency and interoperability with other external systems (concurrently accessing the underlying database system).

Traditionally there have been two sequence identifier strategies to choose from.

The sequence identifier, always hitting the database for every new value assignment. Even with database sequence preallocation we have a significant database round-trip cost.
The seqhilo identifier, using the hi/lo algorithm. This generator calculates some identifier values in-memory, therefore reducing the database round-trip calls. The problem with this optimization technique is that the current database sequence value no longer reflects the current highest in-memory generated value. The database sequence is used as a bucket number, making it difficult for other systems to interoperate with the database table in question. Other applications must know the inner-workings of the hi/lo identifier strategy to properly generate non-clashing identifiers.

The enhanced identifiers

Hibernate offers a new class of identifier generators, addressing many shortcomings of the original ones. The enhanced identifier generators don’t come with a fixed identifier allocation strategy. The optimization strategy is configurable and we can even supply our own optimization implementation. By default Hibernate comes with the following built-in optimizers:

none: every identifier is fetched from the database, so it’s equivalent to the original sequence generator.
hi/lo: it uses the hi/lo algorithm and it’s equivalent to the original seqhilo generator.
pooled: This optimizer uses a hi/lo optimization strategy, but the current in-memory identifiers highest boundary is extracted from an actual database sequence value.
pooled-lo: It’s similar to the pooled optimizer but the database sequence value is used as the current in-memory lowest boundary

In the official release announcement, the pooled optimizers are advertised as being interoperable with other external systems:

Even if other applications are also inserting values, we’ll be perfectly safe because the SEQUENCE itself will handle applying this increment_size.

This is actually what we are looking for; an identifier generator that’s both efficient and doesn’t clash when other external systems are concurrently inserting rows in the same database tables.

Testing time

The following test is going to check how the new optimizers get along with other external database table inserts. In our case the external system will be some native JDBC insert statements on the same database table/sequence.

doInTransaction(new TransactionCallable<Void>() {
	@Override
	public Void execute(Session session) {
		for (int i = 0; i < 8; i++) {
			session.persist(newEntityInstance());
		}
		session.flush();
		assertEquals(8, ((Number) session.createSQLQuery("SELECT COUNT(*) FROM sequenceIdentifier").uniqueResult()).intValue());
		insertNewRow(session);
		insertNewRow(session);
		insertNewRow(session);
		assertEquals(11, ((Number) session.createSQLQuery("SELECT COUNT(*) FROM sequenceIdentifier").uniqueResult()).intValue());
		List<Number> ids = session.createSQLQuery("SELECT id FROM sequenceIdentifier").list();
		for (Number id : ids) {
			LOGGER.debug("Found id: {}", id);
		}
		for (int i = 0; i < 3; i++) {
			session.persist(newEntityInstance());
		}
		session.flush();
		return null;
	}
});

The pooled optimizer

We’ll first use the pooled optimizer strategy:

@Entity(name = "sequenceIdentifier")
public static class PooledSequenceIdentifier {

	@Id
	@GenericGenerator(name = "sequenceGenerator", strategy = "enhanced-sequence",
			parameters = {
					@org.hibernate.annotations.Parameter(name = "optimizer", value = "pooled"),
					@org.hibernate.annotations.Parameter(name = "initial_value", value = "1"),
					@org.hibernate.annotations.Parameter(name = "increment_size", value = "5")
			}
	)
	@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "sequenceGenerator")
	private Long id;
}

Running the test ends-up throwing the following exception:

DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[insert into sequenceIdentifier (id) values (?)][9]} 
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[insert into sequenceIdentifier (id) values (?)][10]} 
DEBUG [main]: n.t.d.l.SLF4JQueryLoggingListener - Name: Time:0 Num:1 Query:{[insert into sequenceIdentifier (id) values (?)][26]} 
WARN  [main]: o.h.e.j.s.SqlExceptionHelper - SQL Error: -104, SQLState: 23505
ERROR [main]: o.h.e.j.s.SqlExceptionHelper - integrity constraint violation: unique constraint or index violation; SYS_PK_10104 table: SEQUENCEIDENTIFIER
ERROR [main]: c.v.h.m.l.i.PooledSequenceIdentifierTest - Pooled optimizer threw
org.hibernate.exception.ConstraintViolationException: could not execute statement
	at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:72) ~[hibernate-core-4.3.5.Final.jar:4.3.5.Final]	
Caused by: java.sql.SQLIntegrityConstraintViolationException: integrity constraint violation: unique constraint or index violation; SYS_PK_10104 table: SEQUENCEIDENTIFIER
	at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source) ~[hsqldb-2.3.2.jar:2.3.2]

I am not sure if this is a bug or just a design limitation, but the pooled optimizer doesn’t meet the interoperability requirement.

To visualize what happens I summarized the sequence calls in the following diagram:

When the pooled optimizer retrieves the current sequence value, it uses it to calculate the lowest in-memory boundary. The lowest value is the actual previous sequence value and this value might have been already used by some other external INSERT statement.

The pooled-lo optimizer

Fortunately, there is one more optimizer(not mentioned in the reference documentation) to be tested. The pooled-lo optimizer uses the current database sequence value as the lowest in-memory boundary, so other systems may freely use the next sequence values without risking identifier clashing:

@Entity(name = "sequenceIdentifier")
public static class PooledLoSequenceIdentifier {

	@Id
	@GenericGenerator(name = "sequenceGenerator", strategy = "enhanced-sequence",
			parameters = {
					@org.hibernate.annotations.Parameter(name = "optimizer",
							value = "pooled-lo"
					),
					@org.hibernate.annotations.Parameter(name = "initial_value", value = "1"),
					@org.hibernate.annotations.Parameter(name = "increment_size", value = "5")
			}
	)
	@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "sequenceGenerator")
	private Long id;
}

To better understand the inner-workings of this optimizer, the following diagram summarizes the identifier assignment process:

Conclusion

A hidden gem is one of those great features that most don’t even know of its existence. The pooled-lo optimizer is extremely useful, yet most people don’t even know of its existence.