Dynamic hot-swap environment inside Java with atomic updates

Ruwan LintonDecember 12th, 2012Last Updated: December 11th, 2012

0 32 7 minutes read

One could argue that the above title can be shortened as OSGi, and I want to discard that thought process at the very beginning.

No offense to OSGi, it is a great specification which got messed up at the implementation layer or at the usability layer, which is what I believe about OSGi. You could of-course do this using OSGi but with some custom work as well. The downside of using OSGi to solve this issue is the unwanted complexity that is introduced on the development process. We were inspired of the JRebel, and we thought for a moment that something on that line is what we wanted and soon realized, we do not want to go into the byte code injection on a production grade runtime. So lets analyze the problem domain.

Problem Domain

The problem we are trying to address is related to the UltraESB, to be specific the live updates feature. UltraESB supports atomically updating/adding new configuration fragments (referred to as “Deployment Units“) to a running ESB, without any down time and most importantly without any inconsistent states. However, one of the limitations of this feature was that if a particular Java class resides in the user class space requires a change for this configuration update of the deployment unit, it required a restart of the JVM. While this is affordable in a clustered deployment (with round-robin restart), in a single instance deployment this introduced a downtime to the whole system.

We had to make sure we preserve few guarantees in solving this;

While deployment unit is being updated the messages already accepted by that unit should use all resources including the loaded classes (and any new class to be loaded) of the existing unit, while any new messages (after completing the update of the unit) has to be dispatched to the new deployment unit configuration and the resource base, which we call the “Guarantee of Consistency“.
To make sure this we need to manage 2 (or more for that matter) versions of the same class on the same JVM for the respective deployment units to use the classes in an atomic manner. Lets call this the “Guarantee of Atomicity” of a deployment unit.
A deployment unit configuration may contain Java fragments which are compiled on-the-fly at the update, which may contain dependencies to the updated classes, having the compiler to be able to locate the new version of the class for compilation process. This is the “Guarantee of Correctness“
The process of updating has to be transparent to the users (they do not need to worry about this, neither on the development time nor on the deployment time) and the whole process should be simple. Lets call this the “Guarantee of Simplicity“

Now you will understand this problem to be something more than OSGi as the compilation is something that OSGi won’t be able to solve on its own (AFAIK) at least at the time I am writing this blog.

If I come back to OSGi, to make sure it is crystal clear, why we didn’t go on that path?, lets analyze the requirement in detail. What we really want is not completely a modular JVM, rather a specific space inside the JVM to be dynamic and atomically re-loadable. Mapping this to our actual use-case, it is making sure that anything that user writes and plugs into the ESB (i.e. a deployment unit containing proxy services, sequences and mediation logic) is dynamically atomically re-loadable, versionable but not the ESB as in the ESB core which executes the user code. This is what users have asked from us and not how you are going to add another feature to the ESB at runtime, without a restart. I agree it is cool to be able to add new features but no body seem to want that and the complexity associated with it. We were ready to do any sort of a complex work, but we were not ready to pass that or any variation of that complexity into our users.

Proposed Solution

If you want the first 2 guarantees “Consistency/Atomicity” (being able to have 2 versions of the same class loaded in runtime and using the right class in the right task out of those) in Java, you have no other-way than writing a new class loader, which forces the JVM to do Child First Class Loading. JVM standard class loaders are all Parent-First. WebAppClassLoader of a typical application container is very close to what we wanted, but with dynamic reloading at production environments. The old class space and the new class space should be managed by 2 instances of this class loader to be able to safely isolate the 2 versions.

To understand the above fact it is important to understand how JVM identifies the classes. Even though from a Java language perspective classes are uniquely identified by the FQN which is the “package name + class name”, from the JVM perspective in addition to the above notion, the class loader which has loaded this class is also a fact in deciding a uniqueness of a class. In OSGi like environments, this is why you see ClassCastException even though you are casting to the correct type. So the conclusion is that we need to write a class loader and keep separate instances of that class loader for different version of different deployment units, which are re-loadable.

In order to make sure that, on-the-fly compiler sees the correct classes to compile the sequence fragments, guaranteeing the “Correctness“, there needs to be a JavaFileManager implementation, again to look for the updated class space. Java compiler task, javac, is searching the dependencies to compile a class via the specified file manager, as JavaFileObject instances and not via a class loader as Class objects, this is to make sure that the compiler effectively resolves the classes as there can be dependencies among the classes being compiled.

Further the user shouldn’t be asked to place jar files in a versionned file structure, to not to affect the guarantee of “Simplicity“, rather the ESB itself has to manage this jar file versionning to make sure that we do not mix different versions of the class spaces. This is also important for the correct operation of the compiler task in different versions as the compiler uses Memory Mapped files to read the class definitions over the input stream to the classes provided by the file manager forcing the maintenance of a physical copy of each and every version of the jar files/classes.

Execution of the Implementation

Let me first point you to the complete changeset which you can refer to time to time while reading the implementation.

We have identified 3 key spaces to be implemented first of which is a class loader to provide classes of the users class space. We name it the HotSwapClassLoader (I am not going to show you the code snippets in the blog, please do not hesitate to browse the complete code, keeping in mind the terms of the AGPL license, as the code is AGPL). Now we wanted to associate this class loader for a version of the deployment unit, which is inherently supported in UltraESB as it keeps these as separate Spring sub contexts.

So any new deployment unit configuration creation including a new version of an existing deployment unit will instantiate a new instance of this class loader and uses that as the resource/class loader for the deployment unit configuration. The class loader at the initialization calculates a normalized hash value of the user class space, and checks whether there is an existing copy of the class space for the current version and it uses that copy or creates a new copy depending on the above assertion. This hash and reusing the existing copy of a class space prevents the management of 2 copies of the same user class space version, as this whole process is synchronized on a static final lock. Then it operates on that copy of the user class space. This copying is a must to not to let the user worry about the class versioning and to make sure the correct set of classes are used in a given configuration. This class loader also takes extensive measures to make sure that the class space copy is cleaned at its earliest possible time. However that only guarantees an eventual cleanup.

The next main item of the implementation is the InMemoryFileManager which was an existing class which got modified to support the user class space in addition to the in-memory compiled source code fragments via the list method as a Iteratable of SwappableJavaFileObject instances. The file manager first queries the HotSwapClassLoader to find the SwappableJavaFileObject instances corresponding to the user class space, and then the system class space and returns as a WrappedIterator which makes sure the user space classes gets the precedence.

In the final step of the implementation, after this adjustment/customization of the core JVM features, it was just a matter of using this custom class loader to load the classes for sequences and proxy service and providing the custom file manager for the fragment compilation task of a deployment unit to complete the solution. We also wanted a switch to disable this while it is enabled by default and recommended to have in the production deployment. Facilitating that and few other customizations of the runtime environment, a concept of Environment has been introduced to UltraESB, where the concept has been borrowed from the Grails environments feature.

It concluded with a successful implementation of a dynamic runtime, which is “Consistent”, “Atomic”, “Correct” and most importantly “Simple to the users”.

Operational Behavior

Now that we have the solution implemented, lets look at few UltraESB internals on how this operates at a production deployment. Any deployment unit configuration at the production environment will be updated upon issuing the configuration add or update administration command. This command can be issued either via raw JMX or via any administration tool implemented on top of the JMX operations, such as UTerm, or via UConsole.

After this implementation, it doesn’t change anything to the way you do updates, it further enhances with the ability to add/replace jar files with modifications effecting the update into the lib/custom user class space of the UltraESB, which makes sure to pick the updated jar files/classes for the new configuration, upon issuing the per said administrative command after the update.
You may try this on the nightly builds of UltraESB or even wait for the 2.0.0 release which is scheduled to be out with lot more new cool yet usable features in the mid January 2013.

Reference: Dynamic hot-swap environment inside Java with atomic updates from our JCG partner Ruwan Linton at the
Blind Vision – of Software Engineering and Life blog.