In my earlier post about Brief Introduction of Hadoop, we have understood “What is Hadoop and What kind of problems it solves”. The next step is to understand Hadoop Core Concepts which talks more about,
- Distributed system design
- How data is distributed across multiple systems
- What are the different components involved and how they communicate with each others
Continue reading “Hadoop Core Concepts : The Bazics”
Apache Hadoop solves very different kind of problems in Big Data world. So before we get an introduction of Hadoop it becomes necessary to understand the core problems in large scale computation. Than after we’ll try to understand how Hadoop solves these problems.
So let’s discuss the pain points first….. Continue reading “Brief Introduction of Hadoop : The Bazics”
Before we move ahead lets learn a bit on Setup Apache Spark,
So, What is Apache Spark?
Apache Spark is a fast, real time and extremely expressive computing system which executes job in distributed (clustered) environment.
It is quite compatible with Apache Hadoop and more almost 10x faster than Hadoop MapReduce on Disk Computing and 100x faster using in memory computations. It provides rich APIs in Java, Scala and Python along with Functional Programming capabilities. Continue reading “6 Steps to Setup Apache Spark 1.0.1 (Multi Node Cluster) on CentOS”
Today is the era of parallel computation and whenever we talk about processing very large chunk of datasets the first word that comes in everyone’s mind is HADOOP. Apache Hadoop sits at the peak of Apache Project lists. In this post I’ll explain you all steps of setting up a Bazic Multi Node Hadoop Cluster (we’ll setup two node cluster).
Here I have used two machines for cluster setup you can repeat the steps of setting up slave nodes on more machines in order to create bigger Hadoop cluster. Continue reading “Setup Multi Node Hadoop 2.6.0 Cluster with YARN”