Before this post we have discussed about what is Hadoop and what kind of issues are solved by Hadoop. Now Let’s deep dive in to various components of Hadoop. Hadoop as a whole distribution provides only two core components and HDFS (which is Hadoop Distributed File System) and MapReduce (which is a distributed batch processing framework). And a complete bunch of machines which are running HDFS and MapReduce are known as Hadoop Cluster.
In my earlier post about Brief Introduction of Hadoop, we have understood “What is Hadoop and What kind of problems it solves”. The next step is to understand Hadoop Core Concepts which talks more about,
- Distributed system design
- How data is distributed across multiple systems
- What are the different components involved and how they communicate with each others
Apache Hadoop solves very different kind of problems in Big Data world. So before we get an introduction of Hadoop it becomes necessary to understand the core problems in large scale computation. Than after we’ll try to understand how Hadoop solves these problems.