Custom Cassandra Data Types

In the blog post Connecting to Cassandra from Java, I mentioned that one advantage for Java developers of Cassandra being implemented in Java is the ability to create custom Cassandra data types. In this post, I outline how to do this in greater detail.

Cassandra has numerous built-in data types, but there are situations in which one may want to add a custom type. Cassandra custom data types are implemented in Java by extending the org.apache.cassandra.db.marshal.AbstractType class. The class that extends this must ultimately implement three methods with the following signatures:
 
 

public ByteBuffer fromString(final String) throws MarshalException
public TypeSerializer getSerializer()
public int compare(Object, Object)

This post’s example implementation of AbstractType is shown in the next code listing.

UnitedStatesState.java – Extends AbstractType

package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.db.marshal.AbstractType;
import org.apache.cassandra.serializers.MarshalException;
import org.apache.cassandra.serializers.TypeSerializer;

import java.nio.ByteBuffer;

/**
 * Representation of a state in the United States that
 * can be persisted to Cassandra database.
 */
public class UnitedStatesState extends AbstractType
{
   public static final UnitedStatesState instance = new UnitedStatesState();

   @Override
   public ByteBuffer fromString(final String stateName) throws MarshalException
   {
      return getStateAbbreviationAsByteBuffer(stateName);
   }

   @Override
   public TypeSerializer getSerializer()
   {
      return UnitedStatesStateSerializer.instance;
   }

   @Override
   public int compare(Object o1, Object o2)
   {
      if (o1 == null && o2 == null)
      {
         return 0;
      }
      else if (o1 == null)
      {
         return 1;
      }
      else if (o2 == null)
      {
         return -1;
      }
      else
      {
         return o1.toString().compareTo(o2.toString());
      }
   }

   /**
    * Provide standard two-letter abbreviation for United States
    * state whose state name is provided.
    *
    * @param stateName Name of state whose abbreviation is desired.
    * @return State's abbreviation as a ByteBuffer; will return "UK"
    *    if provided state name is unexpected value.
    */
   private ByteBuffer getStateAbbreviationAsByteBuffer(final String stateName)
   {
      final String upperCaseStateName = stateName != null ? stateName.toUpperCase().replace(" ", "_") : "UNKNOWN";
      String abbreviation;
      try
      {
         abbreviation =  upperCaseStateName.length() == 2
                       ? State.fromAbbreviation(upperCaseStateName).getStateAbbreviation()
                       : State.valueOf(upperCaseStateName).getStateAbbreviation();
      }
      catch (Exception exception)
      {
         abbreviation = State.UNKNOWN.getStateAbbreviation();
      }
      return ByteBuffer.wrap(abbreviation.getBytes());
   }
}

The above class listing references the State enum, which is shown next.

State.java

package dustin.examples.cassandra.cqltypes;

/**
 * Representation of state in the United States.
 */
public enum State
{
   ALABAMA("Alabama", "AL"),
   ALASKA("Alaska", "AK"),
   ARIZONA("Arizona", "AZ"),
   ARKANSAS("Arkansas", "AR"),
   CALIFORNIA("California", "CA"),
   COLORADO("Colorado", "CO"),
   CONNECTICUT("Connecticut", "CT"),
   DELAWARE("Delaware", "DE"),
   DISTRICT_OF_COLUMBIA("District of Columbia", "DC"),
   FLORIDA("Florida", "FL"),
   GEORGIA("Georgia", "GA"),
   HAWAII("Hawaii", "HI"),
   IDAHO("Idaho", "ID"),
   ILLINOIS("Illinois", "IL"),
   INDIANA("Indiana", "IN"),
   IOWA("Iowa", "IA"),
   KANSAS("Kansas", "KS"),
   LOUISIANA("Louisiana", "LA"),
   MAINE("Maine", "ME"),
   MARYLAND("Maryland", "MD"),
   MASSACHUSETTS("Massachusetts", "MA"),
   MICHIGAN("Michigan", "MI"),
   MINNESOTA("Minnesota", "MN"),
   MISSISSIPPI("Mississippi", "MS"),
   MISSOURI("Missouri", "MO"),
   MONTANA("Montana", "MT"),
   NEBRASKA("Nebraska", "NE"),
   NEVADA("Nevada", "NV"),
   NEW_HAMPSHIRE("New Hampshire", "NH"),
   NEW_JERSEY("New Jersey", "NJ"),
   NEW_MEXICO("New Mexico", "NM"),
   NORTH_CAROLINA("North Carolina", "NC"),
   NORTH_DAKOTA("North Dakota", "ND"),
   NEW_YORK("New York", "NY"),
   OHIO("Ohio", "OH"),
   OKLAHOMA("Oklahoma", "OK"),
   OREGON("Oregon", "OR"),
   PENNSYLVANIA("Pennsylvania", "PA"),
   RHODE_ISLAND("Rhode Island", "RI"),
   SOUTH_CAROLINA("South Carolina", "SC"),
   SOUTH_DAKOTA("South Dakota", "SD"),
   TENNESSEE("Tennessee", "TN"),
   TEXAS("Texas", "TX"),
   UTAH("Utah", "UT"),
   VERMONT("Vermont", "VT"),
   VIRGINIA("Virginia", "VA"),
   WASHINGTON("Washington", "WA"),
   WEST_VIRGINIA("West Virginia", "WV"),
   WISCONSIN("Wisconsin", "WI"),
   WYOMING("Wyoming", "WY"),
   UNKNOWN("Unknown", "UK");

   private String stateName;

   private String stateAbbreviation;

   State(final String newStateName, final String newStateAbbreviation)
   {
      this.stateName = newStateName;
      this.stateAbbreviation = newStateAbbreviation;
   }

   public String getStateName()
   {
      return this.stateName;
   }

   public String getStateAbbreviation()
   {
      return this.stateAbbreviation;
   }

   public static State fromAbbreviation(final String candidateAbbreviation)
   {
      State match = UNKNOWN;
      if (candidateAbbreviation != null && candidateAbbreviation.length() == 2)
      {
         final String upperAbbreviation = candidateAbbreviation.toUpperCase();
         for (final State state : State.values())
         {
            if (state.stateAbbreviation.equals(upperAbbreviation))
            {
               match = state;
            }
         }
      }
      return match;
   }
}

We can also provide an implementation of the TypeSerializer interface returned by the getSerializer() method shown above. That class implementing TypeSerializer is typically most easily written by extending one of the numerous existing implementations of TypeSerializer that Cassandra provides in the org.apache.cassandra.serializers package. In my example, my custom Serializer extends AbstractTextSerializer and the only method I need to add has the signature public void validate(final ByteBuffer bytes) throws MarshalException. Both of my custom classes need to provide a reference to an instance of themselves via static access. Here is the class that implements TypeSerializer via extension of AbstractTypeSerializer:

UnitedStatesStateSerializer.java – Implements TypeSerializer

package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.serializers.AbstractTextSerializer;
import org.apache.cassandra.serializers.MarshalException;

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;

/**
 * Serializer for UnitedStatesState.
 */
public class UnitedStatesStateSerializer extends AbstractTextSerializer
{
   public static final UnitedStatesStateSerializer instance = new UnitedStatesStateSerializer();

   private UnitedStatesStateSerializer()
   {
      super(StandardCharsets.UTF_8);
   }

   /**
    * Validates provided ByteBuffer contents to ensure they can
    * be modeled in the UnitedStatesState Cassandra/CQL data type.
    * This allows for a full state name to be specified or for its
    * two-digit abbreviation to be specified and either is considered
    * valid.
    *
    * @param bytes ByteBuffer whose contents are to be validated.
    * @throws MarshalException Thrown if provided data is invalid.
    */
   @Override
   public void validate(final ByteBuffer bytes) throws MarshalException
   {
      try
      {
         final String stringFormat = new String(bytes.array()).toUpperCase();
         final State state =  stringFormat.length() == 2
                            ? State.fromAbbreviation(stringFormat)
                            : State.valueOf(stringFormat);
      }
      catch (Exception exception)
      {
         throw new MarshalException("Invalid model cannot be marshaled as UnitedStatesState.");
      }
   }
}

With the classes for creating a custom CQL data type written, they need to be compiled into .class files and archived in a JAR file. This process (compiling with javac -cp "C:\Program Files\DataStax Community\apache-cassandra\lib\*" -sourcepath src -d classes src\dustin\examples\cassandra\cqltypes\*.java and archiving the generated .class files into a JAR named CustomCqlTypes.jar with jar cvf CustomCqlTypes.jar *) is shown in the following screen snapshot.

compilingCustomTypesClasses

The JAR with the class definitions of the custom CQL type classes needs to be placed in the Cassandra installation’s lib directory as demonstrated in the next screen snapshot.

movingCqlCustomTypesJarToCassandraLibDir

With the JAR containing the custom CQL data type classes implementations in the Cassandra installation’s lib directory, Cassandra should be restarted so that it will be able to “see” these custom data type definitions.

The next code listing shows a Cassandra Query Language (CQL) statement for creating a table using the new custom type dustin.examples.cassandra.cqltypes.UnitedStatesState.

createAddress.cql

CREATE TABLE us_address
(
   id uuid,
   street1 text,
   street2 text,
   city text,
   state 'dustin.examples.cassandra.cqltypes.UnitedStatesState',
   zipcode text,
   PRIMARY KEY(id)
);

The next screen snapshot demonstrates the results of running the createAddress.cql code above by describing the created table in cqlsh.

descUSAddressWithCustomType

The above screen snapshot demonstrates that the custom type dustin.examples.cassandra.cqltypes.UnitedStatesState is the type for the state column of the us_address table.

A new row can be added to the US_ADDRESS table with a normal INSERT. For example, the following screen snapshot demonstrates inserting an address with the command INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'New York', '10118');:

insertingAddressWithCustomStateTypeIntoCassandraDB

Note that while the INSERT statement inserted “New York” for the state, it is stored as “NY”.

selectingStateFromCassandraCustomType

If I run an INSERT statement in cqlsh using an abbreviation to start with (INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'NY', '10118');), it still works as shown in the output shown below.

insertingAddressWithCustomStateTypeAbbreviationIntoCassandraDB

In my example, an invalid state does not prevent an INSERT from occurring, but instead persists the state as “UK” (for unknown) [see the implementation of this in UnitedStatesState.getStateAbbreviationAsByteBuffer(String)].

One of the first advantages that comes to mind justifying why one might want to implement a custom CQL datatype in Java is the ability to employ behavior similar to that provided by check constraints in relational databases. For example, in this post, my sample ensured that any state column entered for a new row was either one of the fifty states of the United States, the District of Columbia, or “UK” for unknown. No other values can be inserted into that column’s value.

Another advantage of the custom data type is the ability to massage the data into a preferred form. In this example, I changed every state name to an uppercase two-digit abbreviation. In other cases, I might want to always store in uppercase or always store in lowercase or map finite sets of strings to numeric values. The custom CQL datatype allows for customized validation and representation of values in the Cassandra database.

Conclusion

This post has been an introductory look at implementing custom CQL datatypes in Cassandra. As I play with this concept more and try different things out, I hope to write another blog post on some more subtle observations that I make. As this post shows, it is fairly easy to write and use a custom CQL datatype, especially for Java developers.

Reference: Custom Cassandra Data Types from our JCG partner Dustin Marx at the Inspired by Actual Events blog.

Do you want to know how to develop your skillset to become a Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

JPA Mini Book

Learn how to leverage the power of JPA in order to create robust and flexible Java applications. With this Mini Book, you will get introduced to JPA and smoothly transition to more advanced concepts.

JVM Troubleshooting Guide

The Java virtual machine is really the foundation of any Java EE platform. Learn how to master it with this advanced guide!

Given email address is already subscribed, thank you!
Oops. Something went wrong. Please try again later.
Please provide a valid email address.
Thank you, your sign-up request was successful! Please check your e-mail inbox.
Please complete the CAPTCHA.
Please fill in the required fields.

Leave a Reply


9 − = six



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy | Contact
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.
Do you want to know how to develop your skillset and become a ...
Java Rockstar?

Subscribe to our newsletter to start Rocking right now!

To get you started we give you two of our best selling eBooks for FREE!

Get ready to Rock!
You can download the complementary eBooks using the links below:
Close