We have discussed a high level view of YARN Architecture in my post on Understanding Hadoop 2.x Architecture but YARN it self is a wider subject to understand. Keeping that in mind, we’ll about discuss YARN Architecture, it’s components and advantages in this post.
Prior to learn the concepts of Hadoop 2.x Architecture, I strongly recommend you to refer the my post on Hadoop Core Components, internals of Hadoop 1.x Architecture and its limitations. It will give you the idea about Hadoop2 Architecture requirement. And we have already learnt about the basic Hadoop components like Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker.
Before we learn to install Apache Hive on CentOS let me give you the introduction of it. Hive is basically a data warehouse tool to store and process the structured data residing on HDFS. Hive was developed by Facebook and than after it is shifted to Apache Software Foundation and became an open source Apache Hive.
Hadoop 1.x Architecture is a history now because in most of the Hadoop applications are using Hadoop 2.x Architecture. But still understanding of Hadoop 1.x Architecture will provide us the insights of how hadoop has evolved over the time.
In order to access files from HDFS one can use various Hadoop commands from UNIX shell. Additionally, Hadoop also provides powerful Java APIs using which a programmer can write a code for accessing files over HDFS. Before we go into the more details let’s understand the terminology to access files from HDFS.