About Piyas De

Piyas is Sun Microsystems certified Enterprise Architect with 10+ years of professional IT experience in various areas such as Architecture Definition, Define Enterprise Application, Client-server/e-business solutions.

Setting up Apache Hadoop Multi – Node Cluster

We are sharing our experience about Apache Hadoop Installation in Linux based machines (Multi-node). Here we will also share our experience about different troubleshooting also and make update in future.

User creation and other configurations step -

  • We start by adding a dedicated Hadoop system user in each cluster.

 
 
 

$ sudo addgroup hadoop
$ sudo adduser –ingroup hadoop hduser
  • Next we configure the SSH (Secure Shell) on all the cluster to enable secure data communication.
user@node1:~$ su – hduser
hduser@node1:~$ ssh-keygen -t rsa -P “”

The output will be something like the following:

Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
.....
  • Next we need to enable SSH access to local machine with this newly created key:
hduser@node1:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Repeat the above steps in all the cluster nodes and test by executing the following statement

hduser@node1:~$ ssh localhost

This step is also needed to save local machine’s host key fingerprint to the hduser user’s known_hosts file.

Next we need to edit the /etc/hosts file in which we put the IPs and Name of each system in the cluster.

In our scenario we have one master (with IP 192.168.0.100) and one slave (with IP 192.168.0.101)

$ sudo vi /etc/hosts

and we put the values into the host file as key value pair.

 
192.168.0.100 master
192.168.0.101 slave
  • Providing the SSH Access

The hduser user on the master node must be able to connect

    1. to its own user account on the master via ssh master in this context not necessarily ssh localhost.
    2. to the hduser account of the slave(s) via a password-less SSH login.

So we distribute the SSH public key of hduser@master to all its slave, (in our case we have only one slave. If you have more execute the following statement changing the machine name i.e. slave, slave1, slave2).

hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave

Try by connecting master to master and master to slave(s) and check if everything is fine.

Configuring Hadoop

  • Let us edit the conf/masters (only in the masters node)

and we enter master into the file.

Doing this we have told Hadoop that start Namenode and secondary NameNodes in our multi-node cluster in this machine.

The primary NameNode and the JobTracker will always be on the machine we run bin/start-dfs.sh and bin/start-mapred.sh.

  • Let us now edit the conf/slaves(only in the masters node) with
master
slave

This means that, we try to run datanode process on master machine also – where the namenode is also running. We can leave master to act as slave if we have more machines as datanode at our disposal.

if we have more slaves, then to add one host per line like the following:

master
slave
slave2
slave3

etc….

Lets now edit two important files (in all the nodes in our cluster):

  1. conf/core-site.xml
  2. conf/core-hdfs.xml

1) conf/core-site.xml

We have to change the fs.default.parameter which specifies NameNode host and port. (In our case this is the master machine)

<property>

<name>fs.default.name</name>
<value>hdfs://master:54310</value>

…..[Other XML Values]

</property>

Create a directory into which Hadoop will store its data -

$ mkdir /app/hadoop

We have to ensure the directory is writeable by any user:

$ chmod 777 /app/hadoop

Modify core-site.xml once again to add the following property:

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop</value>
</property>

2) conf/core-hdfs.xml

We have to change the dfs.replication parameter which specifies default block replication. It defines how many machines a single file should be replicated to before it becomes available. If we set this to a value higher than the number of available slave nodes (more precisely, the number of DataNodes), we will start seeing a lot of “(Zero targets found, forbidden1.size=1)” type errors in the log files.

The default value of dfs.replication is 3. However, as we have only two nodes available (in our scenario), so we set dfs.replication to 2.

<property>
<name>dfs.replication</name>
<value>2</value>
…..[Other XML Values]
</property>
  • Let us format the HDFS File System via NameNode.

Run the following command at master

bin/hadoop namenode -format
  • Let us start the multi node cluster:

Run the command: (in our case we will run on the machine named as master)

bin/start-dfs.sh

Checking of Hadoop Status -

After everything has started run the jps command on all the nodes to see everything is running well or not.

In master node the desired output will be  -

$ jps

14799 NameNode
15314 Jps
14880 DataNode
14977 SecondaryNameNode

In Slave(s):

$ jps
15314 Jps
14880 DataNode

Ofcourse the Process IDs will vary from machine to machine.

Troubleshooting

It might be possible that Datanode might not get started in all our nodes. At this point if we see the

logs/hadoop-hduser-datanode-.log 

on the effected nodes with the exception -

java.io.IOException: Incompatible namespaceIDs

In this case we need to do the following -

  1. Stop the full cluster, i.e. both MapReduce and HDFS layers.
  2. Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml. In our case, the relevant directory is /app/hadoop/tmp/dfs/data
  3. Reformat the NameNode. All HDFS data will be lost during the format perocess.
  4. Restart the cluster.

Or

We can manually update the namespaceID of problematic DataNodes:

  1. Stop the problematic DataNode(s).
  2. Edit the value of namespaceID in ${dfs.data.dir}/current/VERSION to match the corresponding value of the current NameNode in ${dfs.name.dir}/current/VERSION.
  3. Restart the fixed DataNode(s).

In Running Map-Reduce Job in Apache Hadoop (Multinode Cluster), we will share our experience about Map Reduce Job Running as per apache hadoop example.

Resources

 

Related Whitepaper:

Hadoop Illuminated

Gentle Introduction of Hadoop and Big Data!

This Hadoop book was written with following goals and principles: Make Hadoop accessible to a wider audience -- not just the highly technical crowd. There are a few unique chapters that you won't find in other Hadoop books, for example: Hadoop use cases, Hadoop distributions rundown, BI Tools feature matrix.

Get it Now!  

3 Responses to "Setting up Apache Hadoop Multi – Node Cluster"

  1. mohammad says:

    hi & thanks for this post .
    Is it possible for you to do these steps on windows-based system ?
    (i have this error : Cannot connect to the Map/Reduce location java.net.ConnectException … in eclipse -based on http://ebiquity.umbc.edu/Tutorials/Hadoop/00%20-%20Intro.html )

    thanks.

  2. Shiv says:

    Sir at step
    Configuring Hadoop

    Let us edit the conf/masters (only in the masters node)

    and we enter master into the file.

    where is conf/masters file located in package or we have to create a file with what extension and how we have to write in it…
    please clarifiy sir

  3. Hafiz Muhammad Shafiq says:

    Good tutorial but without example
    Also I have to face the issue disscussed in Troubleshooting part above. I solved it by changing the port numbers on slaves and master ( It was not mentioned above)

Leave a Reply


7 + two =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

15,153 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books