Prior to learn the concepts of Hadoop 2.x Architecture, I strongly recommend you to refer the my post on Hadoop Core Components, internals of Hadoop 1.x Architecture and its limitations. It will give you the idea about Hadoop2 Architecture requirement. And we have already learnt about the basic Hadoop components like Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker.
In order to access files from HDFS one can use various Hadoop commands from UNIX shell. Additionally, Hadoop also provides powerful Java APIs using which a programmer can write a code for accessing files over HDFS. Before we go into the more details let’s understand the terminology to access files from HDFS.
Before this post we have discussed about what is Hadoop and what kind of issues are solved by Hadoop. Now Let’s deep dive in to various components of Hadoop. Hadoop as a whole distribution provides only two core components and HDFS (which is Hadoop Distributed File System) and MapReduce (which is a distributed batch processing framework). And a complete bunch of machines which are running HDFS and MapReduce are known as Hadoop Cluster.
Today is the era of parallel computation and whenever we talk about processing very large chunk of datasets the first word that comes in everyone’s mind is HADOOP. Apache Hadoop sits at the peak of Apache Project lists. In this post I’ll explain you all steps of setting up a Bazic Multi Node Hadoop Cluster (we’ll setup two node cluster).