6 Steps to Setup Apache Spark 1.0.1 (Multi Node Cluster) on CentOS

Before we move ahead lets learn a bit on Setup Apache Spark,

So, What is Apache Spark?

Apache Spark is a fast, real time and extremely expressive computing system which executes job in distributed (clustered) environment.

It is quite compatible with Apache Hadoop and more almost 10x faster than Hadoop MapReduce on Disk Computing and 100x faster using in memory computations. It provides rich APIs in Java, Scala and Python along with Functional Programming capabilities.

This post will give you clear idea on setting up Spark Multi Node cluster on CentOS with Hadoop and YARN.

Before moving forward I assume that you are aware about how to install Java 7 and Apache Hadoop with YARN on CentOS cluster,

Step 1. Download Apache Spark using below commands

Step 2. Configuration in spark-env.sh

Create /home/spark-1.0.1-bin-hadoop2/conf/spark-env.sh and add below lines to the file

Create /home/spark-1.0.1-bin-hadoop2/conf/spark-defaults.conf and add below lines to the file.

Append hostnames of all the slave nodes in /home/spark-1.0.1-bin-hadoop2/conf/slaves file

[Repeat same above step 1 and 2 on other slave nodes (slave1.backtobazics.com in our case)]

Step 3. Start/Stop Spark using below commands

Step 4. Start Spark shell using YARN

Above command will launch spark shell where you can get a Scala prompt. Now you can start writing your spark code command by command and it will executed as soon as you write

Step 5. Creating a sample text file on HDFS for WordCount example

Create a simple text file sample.txt with following content.

Put above file on HDFS using following command.

Step 6. Execute following steps of word count example

After you put your sample text file on HDFS, execute following set of commands which will perform word count on Spark Cluster.

That’s it….. You are done. 🙂

You can access SPARK UI in Browser by below URL
Spark Master URL: http://master.backtobazics.com:8088/

Check my post related to Building Spark Application JAR using Scala and SBT for more information on Submitting Spark job on YARN cluster.

Thank you for reading this post…..!!!!! n Stay tuned for more such posts…..

Leave a Reply

Notify of