Before we move ahead lets learn a bit on Setup Apache Spark,
So, What is Apache Spark?
Apache Spark is a fast, real time and extremely expressive computing system which executes job in distributed (clustered) environment.
It is quite compatible with Apache Hadoop and more almost 10x faster than Hadoop MapReduce on Disk Computing and 100x faster using in memory computations. It provides rich APIs in Java, Scala and Python along with Functional Programming capabilities.
Today is the era of parallel computation and whenever we talk about processing very large chunk of datasets the first word that comes in everyone’s mind is HADOOP. Apache Hadoop sits at the peak of Apache Project lists. In this post I’ll explain you all steps of setting up a Bazic Multi Node Hadoop Cluster (we’ll setup two node cluster).
Here I have used two machines for cluster setup you can repeat the steps of setting up slave nodes on more machines in order to create bigger Hadoop cluster.