Spring for Apache Hadoop was born to resolve the issue of having poorly constructed Hadoop applications, which usually consist of command line utilities, scripts and pieces of code stitched together. It provides a consistent programming and configuration model across a wide range of Hadoop ecosystem projects, as expected from a Spring project.
The well known Template API design pattern is also embraced here, so the framework includes classes like:
Another embraced aspect is the approach of starting small and growing into complex solutions. So, Spring for Hadoop introduces various Runner classes which allow the execution of Hive, Pig scripts, vanilla Map/Reduce or Streaming jobs, Cascading flows but also invocation of pre and post generic JVM-based scripting all through the familiar JDK Callable contract.
When things start to get more complex, upgrading to Spring Batch is straightforward and easy. Spring Batch’s rich functionality for handling the ETL processing of large file translates directly into Hadoop use cases for the ingestion and export of files form HDFS.
Also, the use of Spring Hadoop in combination with Spring Integration allows for rich processing of event streams that can be transformed, enriched, filtered, before being read and written from HDFS or other storages such as NoSQL stores, for which Spring Data provides plenty of support.