Sunday, 11 July 2010

Java Best Practices – DateFormat in a Multithreading Environment

This is the first of a series of articles concerning proposed practices while working with the Java programming language.

All discussed topics are based on use cases derived from the development of mission critical, ultra high performance production systems for the telecommunication industry.

Prior reading each section of this article it is highly recommended that you consult the relevant Java API documentation for detailed information and code samples.

All tests are performed against a Sony Vaio with the following characteristics :
  • System : openSUSE 11.1 (x86_64)
  • Processor (CPU) : Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz
  • Processor Speed : 1,200.00 MHz
  • Total memory (RAM) : 2.8 GB
  • Java : OpenJDK 1.6.0_0 64-Bit
The following test configuration is applied :
  • Concurrent worker Threads : 200
  • Test repeats per worker Thread : 1000
  • Overall test runs : 100

Using DateFormat in a multithreading environment

Working with DateFormat in a multithreading environment can be tricky. The Java API documentation clearly states :

Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.

A typical case scenario is to convert a Date to its String representation or vice versa, using a predefined format. Creating new DateFormat instances for every conversion is very inefficient. You should keep in mind that the static factory methods “getDateInstance(..)” also create new DateFormat instances when used. What most developers do is that they construct a DateFormat instance, using a DateFormat implementation class (e.g. SimpleDateFormat), and assign its value to a class variable. The class scoped variable is used for all their Date parsing and formatting needs. The aforementioned approach, although very efficient, can cause problems when multiple threads access the same instance of the class variable, due to lack of synchronization on the DateFormat class. Typical exceptions thrown when parsing to create a Date object are :
  • java.lang.NumberFormatException
  • java.lang.ArrayIndexOutOfBoundsException
You should also experience malformed Date to String representation when formatting is performed.

To properly handle the aforementioned issues, it is vital to clarify the architecture of your multithreading environment. The Java Virtual Machine allows an application to have multiple threads of execution running concurrently. Typically, in a multithreading environment (either a container inside the JVM or the JVM itself), Thread pooling should be performed. Worker threads should be constructed and initialized upon startup, utilized to execute your programs. For example a Web container constructs a pool of worker threads to serve all incoming traffic. Thread pooling is the most efficient way to manipulate system resources mainly due to the fact that Thread creation and initialization is a high resource consuming task for the Java Virtual Machine. Nevertheless application parallelism can be achieved by simply creating a new Thread of execution for every piece of code you want to be executed concurrently.

Concerning class scoped DateFormat instances :
  • If you have clarified that NO Thread pools are used in your environment then only new Thread instances concurrently access your DateFormat instance. In this case it is recommended to synchronize that DateFormat instance externally
  • In case Thread pools are used, there is a limited number of Thread instances that can access your DateFormat instance concurrently. Thus it is recommended to create separate DateFormat instances for each thread using the ThreadLocal approach

Below are examples of “getDateInstance(..)”, "synchronization" and ThreadLocal approaches :
package com.javacodegeeks.test;

import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

public class ConcurrentDateFormatAccess {

 public Date convertStringToDate(String dateString) throws ParseException {
  return SimpleDateFormat.getDateInstance(DateFormat.MEDIUM).parse(dateString);
 }

}

package com.javacodegeeks.test;

import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

public class ConcurrentDateFormatAccess {

 private DateFormat df = new SimpleDateFormat("yyyy MM dd");

 public Date convertStringToDate(String dateString) throws ParseException {
  Date result;
  synchronized(df) {
   result = df.parse(dateString);
  }
  return result;
 }

}
Things to notice here :
  • Every individual Thread executing the “convertStringToDate” operation, is trying to acquire the monitor lock on the DateFormat Object prior acquiring a reference to the DateFormat class variable instance . If another Thread is holding the lock then the current Thread waits until the lock is released. That way only one Thread is accessing the DateFormat instance at a time

    package com.javacodegeeks.test;
    
    import java.text.DateFormat;
    import java.text.ParseException;
    import java.text.SimpleDateFormat;
    import java.util.Date;
    
    public class ConcurrentDateFormatAccess {
    
     private ThreadLocal<DateFormat> df = new ThreadLocal<DateFormat> () {
    
      @Override
      public DateFormat get() {
       return super.get();
      }
    
      @Override
      protected DateFormat initialValue() {
       return new SimpleDateFormat("yyyy MM dd");
      }
    
      @Override
      public void remove() {
       super.remove();
      }
    
      @Override
      public void set(DateFormat value) {
       super.set(value);
      }
    
     };
    
     public Date convertStringToDate(String dateString) throws ParseException {
      return df.get().parse(dateString);
     }
    
    }
    
    Things to notice here :
    • Every individual Thread executing the “convertStringToDate” operation, invokes the “df.get()” operation in order to initialize or retrieve an already initialized reference of its local scoped DateFormat instance

    Below we present a performance comparison chart between the three aforementioned approaches (notice that we have tested the parsing functionality of the DateFormat utility class. We convert a String representation of a date to its Date Object equivalent, according to a specific date format).


    The horizontal axis represents the number of test runs and the vertical axis the average transactions per second (TPS) for each test run. Thus higher values are better. As you can see by using Thread pools and the ThreadLocal approach you can achieve superior performance compared to the “synchronization” and the “getDateInstance(..)” approaches.

    Lastly, let me pinpoint that using the ThreadLocal approach without Thread pools, is equivalent to using the “getDateInstance(..)” approach due to the fact that every new Thread has to initialize its local DateFormat instance prior using it, thus a new DateFormat instance will be created with every single execution.


    Happy Coding!


    Justin


    Related Articles :
    Share this article:



    13 comments:

    1. Hi Just!
      Thanks for a clear explanation and performance results.
      One thing to mention: your solution holds all he created DateFormat instances s in memory, never freeing them. I can suggest solution for this problem:
      http://asolntsev.blogspot.com/2009/05/threadsafedateformat.html
      ReplyDelete
    2. You could also use FastDateFormat in org.apache.commons.lang.time

      "FastDateFormat is a fast and thread-safe version of SimpleDateFormat"
      ReplyDelete
    3. Cheers! You just gave me a good hint on a task I'm working on.
      ReplyDelete
    4. You don't need to override all methods on ThreadLocal - just initialValue:

      private ThreadLocal df = new ThreadLocal () {
      @Override
      protected DateFormat initialValue() {
      return new SimpleDateFormat("yyyy MM dd");
      }
      };

      On my project we wrapped this inside a ThreadsafeSimpleDateFormat which took a
      DateFormat as constructor argument, cloned it (for safety) and stored as "seed". The initialValue() of our innerThreadLocal then returned seed.clone().

      ThreadsafeSimpleDateFormat had parse and format methods which delegated to innerThreadLocal.get().

      If you're REALLY paranoid, you'll synchronize the call to seed.clone() on seed, since there is no guarantee that the clone() method is thread safe.


      Apache commons FastDateFormat is great - except that it doesn't do parsing.

      JodaTime is great, but requires you to use JodaTime instead of java.util.Date/java.util.Calendar. Great choice, but more intrusive.

      Andrei's solution also looks good, but could be implemented slightly more elegant in Java 5 with generics.

      One thing you must beware of is Locale.setDefault(). If you don't provide a Locale when constructing your SimpleDateFormat, Locale.getDefault() will be used - whatever that is, at the time initialValue() is called. So if you change this later, it will NOT affect those ThreadLocal's which have already been created, but it will affect those that are created later. Here be dragons.
      ReplyDelete
    5. Hi! :)

      I was just testint this stuff the other week, and I really think this is a dangerous subject many developers do not pay attention to.

      But I just found out a really nice way of doing it. Instead of synchronizing one instances, there are two better solutions, in my opinion:

      If you have lots of formatting, you should use a Flyweight pattern solution. In short, use a cache of instances.

      But that is not so simple and in most cases is like too much trouble. So comes the trick. =)
      Just create a static constant and use it as template by CLONING IT. ;)

      My testings showed that cloning a DateFormat, TextFormat and any other similiar is 20 times faster than creating a new one, if my memory does not fail me. =)

      Yes, just do
      DATE_FORMAT.clone().format(new Date());

      Maybe someday I post this test on my blog... :P
      ReplyDelete
    6. I wrote a ConcurrentDateFormat class which implements DateFormat, it stores a cloned instance of a passed in DateFormat as a template, so its initialValue() just returns a clone of the stored template. This approach combines the best of ThreadLocal and cloning, and is completely transparent.
      ReplyDelete
    7. I had a look at org.apache.commons.lang.time.FastDateFormat's source code and it seems that all methods returning a DateFormat instance are synchronized. Although it uses some kind of internal caching it clearly can't scale as well as the ThreadLocal approach since all threads need to wait until they acquire a monitor lock..

      Does anyone know if org.joda.time.format.DateTimeFormatter in Joda Time is thread safe? Nothing relevant seems to be mentioned in the class's documentation.
      ReplyDelete
    8. I really have to ask why. In the examples, the only method being called (ultimately) on the date format object is parse. So, what class state on the date format object does parse change? I would be surprised if it changed any. The reason for the comment in the API for SimpleDateFormat not being synchronized is for the methods that DO change the state of the SimpleDateFormat object. But parse is not one of them.
      So, if your example was to show your prowess in using and explaining ThreadLocal, good job! You did that. But, if your point was to show that we NEEDED to use ThreadLocal when we are going to be parsing dates with a DateFormat, then I think you may have missed the mark because I am not convinced that parse needs to be synchronized.
      ReplyDelete
    9. Hello developmentech,

      My point was to show an efficient way to properly utilize DateFormat in a multithreading environment. You definitelly NEED to access DateFormat instances in a synchronized manner if you intend to use them by multiple threads (when parsing also). We designate typical exceptions thrown when parsing to create a date and viceversa.

      Lastly if you are not convinced that parse needs to be synchronized you just have to implement a simple case schenario and test it for yourself. You will be amazed of how easy it is to experience errorneous results.

      BRs
      ReplyDelete
    10. Thanks for this nice explanation and the advantage of ThreadLocal!
      ReplyDelete
    11. Problems go deeper than Formatters (Date, Number, etc) in Java. Actually, Date objects can mutate when read (!) as demonstrated here: http://hype-free.blogspot.com/2011/01/java-date-objects-can-mutate-even-when.html, which is completely unexpected and undocumented, so my recommendation is to use something like JodaTime.
      ReplyDelete
    12. hi, which tool you used to generate performance report? thanks a lot~~
      ReplyDelete
    13. You have implemented a simple multi thread testing framework of our own.
      ReplyDelete

    Related Posts Plugin for WordPress, Blogger...