The Apache community has voted to release Apache Hadoop 2.4.0, so the new release is now available and consists of important improvements. The improvements are related not only to HDFS but also to MapReduce.
The important improvement in HDFS is about NameNodes. Multiple independent Namenodes and Namespaces are now used that do not require coordination with each other. Datanodes are used as common storage for blocks by all Namenodes and each datanode registers with all Namenodes in the cluster. Heartbeats and block reports are sent from datanodes to Namenodes, that send back commands handled by the datanodes.
MapReduce changes are mostly about the ResourceManager. Since 0.23 version of Hadoop the two major functions of the JobTracker, resource management and job life-cycle management were seperated into separate components. In this release, the new ResourceManager manages the global assignment of resources to applications and the per-application ApplicationMaster manages the application‚ scheduling and coordination. The per-application ApplicationMaster is a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
Interested to get started with Hadoop? Check out our complimentary whitepaper “Hadoop Illuminated”!Related Whitepaper:
Gentle Introduction of Hadoop and Big Data!
This Hadoop book was written with following goals and principles: Make Hadoop accessible to a wider audience -- not just the highly technical crowd. There are a few unique chapters that you won't find in other Hadoop books, for example: Hadoop use cases, Hadoop distributions rundown, BI Tools feature matrix.