The Apache community has voted to release Apache Hadoop 2.4.0, so the new release is now available and consists of important improvements. The improvements are related not only to HDFS but also to MapReduce.
The important improvement in HDFS is about NameNodes. Multiple independent Namenodes and Namespaces are now used that do not require coordination with each other. Datanodes are used as common storage for blocks by all Namenodes and each datanode registers with all Namenodes in the cluster. Heartbeats and block reports are sent from datanodes to Namenodes, that send back commands handled by the datanodes.
MapReduce changes are mostly about the ResourceManager. Since 0.23 version of Hadoop the two major functions of the JobTracker, resource management and job life-cycle management were seperated into separate components. In this release, the new ResourceManager manages the global assignment of resources to applications and the per-application ApplicationMaster manages the application‚ scheduling and coordination. The per-application ApplicationMaster is a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
Interested to get started with Hadoop? Check out our complimentary whitepaper “Hadoop Illuminated”!