Core Java

Built-in Serialization techniques

This article is part of our Academy Course titled Advanced Java.

This course is designed to help you make the most effective use of Java. It discusses advanced topics, including object creation, concurrency, serialization, reflection and many more. It will guide you through your journey to Java mastery! Check it out here!

1. Introduction

This part of the tutorial is going to be solely devoted to serialization: the process of translating Java objects into a format that can be used to store and be reconstructed later in the same (or another) environment (http://en.wikipedia.org/wiki/Serialization). Serialization not only allows saving and loading Java objects to/from the persistent storage, but is also a very important component of modern distributed systems communication.

Serialization is not easy, but effective serialization is even harder. Besides the Java standard library, there are many serialization techniques and frameworks available: some of them are using compact binary representation, others put the readability on the first place. Although we are going to mention many alternatives along the way, our attention will be concentrated on the ones from Java standard library (and latest specifications): Serializable, Externalizable, Java Architecture for XML Binding (JAXB, JSR-222) and Java API for JSON Processing (JSON-P, JSR-353).

2. Serializable interface

Arguably, the easiest way in Java to mark the class as available for serialization is by implementing the java.io.Serializable interface. For example:

public class SerializableExample implements Serializable {
}

The serialization runtime associates with each serializable class a special version number, called a serial version UID, which is used during deserialization (the process opposite to serialization) to make sure that the loaded classes for the serialized object are compatible. In case the compatibility has been compromised, the InvalidClassException will be raised.

A serializable class may introduce its own serial version UID explicitly by declaring a field with name serialVersionUID that must be static, final, and of type long. For example:


public class SerializableExample implements Serializable {
    private static final long serialVersionUID = 8894f47504319602864L;   
}

However, if a serializable class does not explicitly declare a serialVersionUID field, then the serialization runtime will generate a default serialVersionUID field for that class. It is worth to know that it is strongly recommended by all classes implementing Serializable to explicitly declare the serialVersionUID field, because the default serialVersionUID generation heavily relies on intrinsic class details and may vary depending on Java compiler implementation and its version. As such, to guarantee a consistent behavior, a serializable class must always declare an explicit serialVersionUID field.

Once the class becomes serializable (implements Serializable and declares serialVersionUID), it could be stored and retrieved using, for example, ObjectOutputStream / ObjectInputStream:

final Path storage = new File( "object.ser" ).toPath();

try( final ObjectOutputStream out = 
        new ObjectOutputStream( Files.newOutputStream( storage ) ) ) {
    out.writeObject( new SerializableExample() );
}

Once stored, it could be retrieved in a similar way, for example:

try( final ObjectInputStream in = 
        new ObjectInputStream( Files.newInputStream( storage ) ) ) {
    final SerializableExample instance = ( SerializableExample )in.readObject();
    // Some implementation here
}

As we can see, the Serializable interface does not provide a lot of control over what should be serialized and how (with exception of transient keyword which marks the fields as non-serializable). Moreover, it limits the flexibility of changing the internal class representation as it could break the serialization / deserialization process. That is why another interface, Externalizable, has been introduced.

3. Externalizable interface

In contrast to Serializable interface, Externalizable delegates to the class the responsibility of how it should be serialized and deserialized. It has only two methods and here is its declaration from the Java standard library:

public interface Externalizable extends java.io.Serializable {
    void writeExternal(ObjectOutput out) throws IOException;
    void readExternal(ObjectInput in) throws IOException, ClassNotFoundException;
}

In turn, every class which implements Externalizable interface should provide the implementation of these two methods. Let us take a look on the example:

public class ExternalizableExample implements Externalizable {
    private String str;
    private int number;
    private SerializableExample obj;
        
    @Override
    public void readExternal(final ObjectInput in) 
            throws IOException, ClassNotFoundException {
        setStr(in.readUTF());
        setNumber(in.readInt());
        setObj(( SerializableExample )in.readObject());
    }
    
    @Override
    public void writeExternal(final ObjectOutput out) 
            throws IOException {
        out.writeUTF(getStr());
        out.writeInt(getNumber());
        out.writeObject(getObj());
    }
}

Similarly to the classes implementing Serializable, the classes implementing Externalizable could be stored and retrieved using, for example, ObjectOutputStream / ObjectInputStream:

final Path storage = new File( "extobject.ser" ).toPath();
        
final ExternalizableExample instance = new ExternalizableExample();
instance.setStr( "Sample String" );
instance.setNumber( 10 );
instance.setObj( new SerializableExample() );
        
try( final ObjectOutputStream out = 
        new ObjectOutputStream( Files.newOutputStream( storage ) ) ) {
    out.writeObject( instance );
}
        
try( final ObjectInputStream in = 
        new ObjectInputStream( Files.newInputStream( storage ) ) ) {
    final ExternalizableExample obj = ( ExternalizableExample )in.readObject();
    // Some implementation here
}

The Externalizable interface allows a fine-grained serialization / deserialization customization in the cases when the simpler approach with Serializable interface does not work well.

4. More about Serializable interface

In the previous section we mentioned that the Serializable interface does not provide a lot of control over what should be serialized and how. In fact, it is not completely true (at least when ObjectOutputStream / ObjectInputStream are used). There are some special methods which any serializable class can implement in order to control the default serialization and deserialization.

private void writeObject(ObjectOutputStream out) throws IOException;

This method is responsible for writing the state of the object for its particular class so that the corresponding readObject method can restore it (the default mechanism for saving the Object’s fields can be invoked by calling out.defaultWriteObject).

private void readObject(ObjectInputStream in) throws IOException,ClassNotFoundException;

This method is responsible for reading from the stream and restoring the state of the object (the default mechanism for restoring the Object’s fields can be invoked by calling in.defaultReadObject).

private void readObjectNoData() throws ObjectStreamException;

This method is responsible for initializing the state of the object in the case when the serialization stream does not list the given class as a superclass of the object being deserialized.

Object writeReplace() throws ObjectStreamException;

This method is used when serializable classes need to designate an alternative object to be used when writing an object to the stream.


Object readResolve() throws ObjectStreamException;

And lastly, this method is used when serializable classes need to designate a replacement when an instance of it is read from the stream.

The default serialization mechanism (using Serializable interface) could get really cumbersome in Java once you know the intrinsic implementation details and those special methods to use. More code you are writing to support serialization, more likely more bugs and vulnerabilities will show off.

However, there is a way to reduce those risks by employing quite simple pattern named Serialization Proxy, which is based on utilizing writeReplace and readResolve methods. The basic idea of this pattern is to introduce dedicated companion class for serialization (usually as private static inner class), which complements the class required to be serialized. Let us take a look on this example:

public class SerializationProxyExample implements Serializable {
    private static final long serialVersionUID = 6163321482548364831L;

    private String str;
    private int number;        
    
    public SerializationProxyExample( final String str, final int number) {
        this.setStr(str);
        this.setNumber(number);
    }

    private void readObject(ObjectInputStream stream) throws InvalidObjectException {
        throw new InvalidObjectException( "Serialization Proxy is expected" );
    }
    
    private Object writeReplace() {
        return new SerializationProxy( this );
    }
    
    // Setters and getters here
}

When the instances of this class are being serialized, the class SerializationProxyExample implementation provides the replacement object (instance of the SerializationProxy class) instead. It means that instances of the SerializationProxyExample class will never be serialized (and deserialized) directly. It also explains why the readObject method raises an exception in case a deserialization attempt somehow happens. Now, let us take a look on the companion SerializationProxy class:

private static class SerializationProxy implements Serializable {
    private static final long serialVersionUID = 8368440585226546959L;

    private String str;
    private int number;
        
    public SerializationProxy( final SerializationProxyExample instance ) {
        this.str = instance.getStr();
        this.number = instance.getNumber();
    }
        
    private Object readResolve() {
        return new SerializationProxyExample(str, number); // Uses public constructor
    }
}

In our somewhat simplified case, the SerializationProxy class just duplicates all the fields of the SerializationProxyExample (but it could be much complicated than that). Consequently, when the instances of this class are being deserialized, the readResolve method is called and SerializationProxy provides the replacement as well, this time in a shape of SerializationProxyExample instance. As such, the SerializationProxy class serves as a serialization proxy for SerializationProxyExample class.

5. Serializability and Remote Method Invocation (RMI)

For quite some time, Java Remote Method Invocation (RMI) was the only mechanism available for building distributed applications on Java platform. RMI provides all the heavy lifting and makes it possible to transparently invoke the methods of remote Java objects from other JVMs on the same host or on different physical (or virtual) hosts. In the foundation of RMI lays object serialization, which is used to marshal (serialize) and unmarshal (deserialize) method parameters.

RMI is still being used in many Java applications nowadays, but it lesser and lesser becomes a choice because of its complexity and communication restrictions (most of the firewalls do block RMI ports). To get more details about RMI please refer to official documentation.

6. JAXB

Java Architecture for XML Binding, or just JAXB, is probably the oldest alternative serialization mechanism available to Java developers. Underneath, it uses XML as the serialization format, provides a wide range of customization options and includes a lot of annotations which makes JAXB very appealing and easy to use (annotations are covered in part 5 of the tutorial, How and when to use Enums and Annotations).

Let us take a look on a quite simplified example of the plain old Java class (POJO) annotated with JAXB annotations:

import java.math.BigDecimal;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;

@XmlAccessorType( XmlAccessType.FIELD )
@XmlRootElement( name = "example" )
public class JaxbExample {
    @XmlElement(required = true) private String str;
    @XmlElement(required = true) private BigDecimal number;
    
    // Setters and getters here
}

To serialize the instance of this class into XML format using JAXB infrastructure, the only thing needed is the instance of the marshaller (or serializer), for example:

final JAXBContext context = JAXBContext.newInstance( JaxbExample.class );        
final Marshaller marshaller = context.createMarshaller();
     
final JaxbExample example = new JaxbExample();
example.setStr( "Some string" );
example.setNumber( new BigDecimal( 12.33d, MathContext.DECIMAL64 ) );
        
try( final StringWriter writer = new StringWriter() ) {
    marshaller.marshal( example, writer );
}

Here is the XML representation of the JaxbExample class instance from the example above:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<example>
    <str>Some string</str>
    <number>12.33000000000000</number>
</example>

Following the same principle, the instances of the class could be deserialized back from XML representation into the Java objects using the instance of the unmarshaller (or deserializer), for example:

final JAXBContext context = JAXBContext.newInstance( JaxbExample.class );
        
final String xml = "" +
    "<?xml version=\\"1.0\\" encoding=\\"UTF-8\\" standalone=\\"yes\\"?>" +
    "<example>" +
    "    <str>Some string</str>" +
    "    <number>12.33000000000000</number>" +
    "</example>";
        
final Unmarshaller unmarshaller = context.createUnmarshaller();
try( final StringReader reader = new StringReader( xml ) ) {
    final JaxbExample example = ( JaxbExample )unmarshaller.unmarshal( reader );
    // Some implementaion here
}

As we can see, JAXB is quite easy to use and the XML format is still quite popular choice nowadays. However, one of the fundamental pitfalls of XML is verbosity: quite often the necessary XML structural elements significantly surpass the effective data payload.


 

7. JSON-P

Since 2013, Java developers are able to use JSON as the serialization format, by virtue of newly introduced Java API for JSON Processing (JSON-P).

As of now, JSON-P is not a part of Java standard library although there are many discussions to include native JSON support into the language in the upcoming Java 9 release (http://openjdk.java.net/jeps/198). Nevertheless, it is there and available as part of Java JSON Processing Reference Implementation (https://jsonp.java.net/).

In contrast to JAXB, there is nothing required to be added to the class to make it suitable for JSON serialization, for example:

public class JsonExample {
    private String str;
    private BigDecimal number;
    // Setters and getters here
}

The serialization is not as transparent as with JAXB, and requires a bit of code to be written for each class intended to be serialized into JSON, for example:

final JsonExample example = new JsonExample();
example.setStr( "Some string" );
example.setNumber( new BigDecimal( 12.33d, MathContext.DECIMAL64 ) );
        
try( final StringWriter writer = new StringWriter() ) {
    Json.createWriter(writer).write( 
        Json.createObjectBuilder()
            .add("str", example.getStr() )
            .add("number", example.getNumber() )
            .build()
        );
}

And here is the JSON representation of the JsonExample class instance from the example above:

{
    "str":"Some string",
    "number":12.33000000000000
}

The deserialization process goes in the same vein:

final String json = "{\\"str\\":\\"Some string\\",\\"number\\":12.33000000000000}";  
      
try( final StringReader reader = new StringReader( json ) ) {
    final JsonObject obj = Json.createReader( reader ).readObject();
    final JsonExample example = new JsonExample();
    example.setStr( obj.getString( "str" ) );
    example.setNumber( obj.getJsonNumber( "number" ).bigDecimalValue() );
}

It is fair to say that at the moment JSON support in Java is pretty basic. Nonetheless it is a great thing to have and Java community is working on enriching the JSON support by introducing Java API for JSON Binding (JSON-B, JSR-367). With this API the serialization and deserialization of the Java objects to/from JSON should be as transparent as JAXB has.

8. Cost of serialization

It is very important to understand that though serialization / deserialization looks simple in Java, it is not free and depending on the data model and data access patterns may consume quite a lot of network bandwidth, memory and CPU resources. More to that, nevertheless Java has some kind of versioning support for the serializable classes (using serial version UID as we have seen in the section Serializable interface), it does make the development process much harder as developers are on their own to figure out how to manage data model evolution.

To add another point, Java serialization does not work well outside of JVM world. It is a significant constraint for the modern distributed applications which are built using multiple programming languages and runtimes.

That explains why many alternative serialization frameworks and solutions have emerged and became very popular choice in the Java ecosystem.

9. Beyond Java standard library and specifications

In this section we are going to look on alternative solutions for painless and effective Java serialization, starting from the Fast-serialization project (http://ruedigermoeller.github.io/fast-serialization/): the fast Java serialization drop in-replacement. The usage of Fast-serialization is not much different from what Java standard library provides but it claims to be much faster and more effective.

Another set of frameworks has a different take on the problem. They are based on structured data definition (or protocol) and serialize data into compact binary representation (the corresponding data model could be even generated from the definition). Aside from that, those frameworks are going far beyond just Java platform and can be used for cross-language / cross-platform serialization. The most known Java libraries in this space are Google Protocol Buffers (https://developers.google.com/protocol-buffers/), Apache Avro (http://avro.apache.org/) and Apache Thrift (https://thrift.apache.org/).

10. What’s next

In this part of the tutorial we have discussed the built-in serialization techniques provided by the Java language and its runtime. We have seen how important serialization is today, when mostly every single application being built is a part of larger distributed system and needs to communicate with the rest of it (or with other external systems). In the next part of the tutorial we are going to talk about reflection and dynamic languages support in Java.

11. Download the Source code

You can dowload the source code of this course here: advanced-java-part-10

Andrey Redko

Andriy is a well-grounded software developer with more then 12 years of practical experience using Java/EE, C#/.NET, C++, Groovy, Ruby, functional programming (Scala), databases (MySQL, PostgreSQL, Oracle) and NoSQL solutions (MongoDB, Redis).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button