Setup Multi Node Hadoop 2.6.0 Cluster with YARN

Today is the era of parallel computation and whenever we talk about processing very large chunk of datasets the first word that comes in everyone’s mind is HADOOP. Apache Hadoop sits at the peak of Apache Project lists. In this post I’ll explain you all steps of setting up a Bazic Multi Node Hadoop Cluster (we’ll setup two node cluster).

Here I have used two machines for cluster setup you can repeat the steps of setting up slave nodes on more machines in order to create bigger Hadoop cluster.

Prior to that I assume that you have gone through below checklist and I prefer you to learn about below points if you found them new.

  • Prepare new Machines or VMs with CentOS installed (I have used CentOS 6.4)
  • Setup with Static IP and proper FQDN
  • Make sure all machines have proper IP and HOSTNAME entries in /etc/hosts
  • Setup Passwordless SSH from master node to slave nodes
  • Make sure that IPv6 is disabled on all nodes

Step 1 : Disable IPv6 on CentOS node (If your network supports IPv6)

If your node supports IPv6 than I would recommend to disable IPv6 by editing /etc/sysctl.conf file as Hadoop is not supported on IPv6 network. Append following to the end of the file

Read more about this on Hadoop IPv6 WIKI.

So let’s get started…..

Step 2 : Download Hadoop 2.6.0 and extract it to /opt/ directory on Master Node

I have used following machines.

Master: 192.168.1.10 – master.backtobazics.com
Slave 1: 192.168.1.11 – slave1.backtobazics.com

Below are the commands,

Step 3 : Configure Variables and Reload the Configuration

Set environment variable uses by hadoop by editing /etc/profile file and append following values at end of file by executing below commands.

Reload Configuration using below command.

Step 4 : Setting up Hadoop Environment

Create Hadoop data directories

Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable with JDK base directory path.

Step 5 : Edit Hadoop XML Configuration files

Edit Configuration Files located at $HADOOP_HOME/etc/hadoop/ directory with very basic configurations.

hdfs-site.xml

 

core-site.xml

 

mapred-site.xml

yarn-site.xml

Append host names of all the slave nodes in slaves file. In my case it would be,

Step 6 : Setting up Slave nodes

For setting up the slave node, Repeat 2 to 5 steps or copy /opt/hadoop-2.6.0 directory and repeat step 3 & 4 @ slave node keeping the directory structure same.

That it…..!!!!! We are done with the installation of Hadoop (Distributed mode) with YARN on Multiple Nodes. 🙂

We need to format Hadoop NameNode using below command prior to start Hadoop cluster

Step 7 : Commands for starting and stopping Hadoop Cluster

Start/Stop HDFS using below commands

Start/Stop YARN services using below commands

Step 8 : Port Filtering in Firewall by updating below entries in /etc/sysconfig/iptables file

Append below lines in /etc/sysconfig/iptables file

And restart iptables

Performing above step on both the server will open up HTTP access to Web UI of Hadoop processes.

Note : Instead of performing  step 8 you can also disable iptables service using following command in all machines.

Now you can access Hadoop Services in Browser

Name Node: http://master.backtobazics.com:50070/
YARN Services: http://master.backtobazics.com:8088/
Secondary Name Node: http://master.backtobazics.com:50090/
Data Node 1: http://master.backtobazics.com:50075/
Data Node 2: http://slave1.backtobazics.com:50075

Leave a Reply

avatar
  Subscribe  
Notify of