Understanding Hadoop 2.x Architecture and it’s Daemons

Prior to learn the concepts of Hadoop 2.x Architecture, I strongly recommend you to refer the my post on Hadoop Core Components, internals of Hadoop 1.x Architecture and its limitations. It will give you the idea about Hadoop2 Architecture requirement. And we have already learnt about the basic Hadoop components like Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker. Continue reading “Understanding Hadoop 2.x Architecture and it’s Daemons”

Java 8 Default Methods in Interface

Interfaces went through major changes while redesign in Java 8 and as a part of that Java 8 default methods in Interface are the major addition. Java 8 interfaces also supports static methods along with default and abstract methods. Let’s dig into the more details to know the facts about default methods in interfaces. Continue reading “Java 8 Default Methods in Interface”

Java 8 Streams By Examples

Whenever we hear about Java 8 Streams the first thing which strike in mind would be Java I/O stream and classes like InputStream and OutputStream. But here in Java 8 Stream is quite different concept than Java I/O Streams. Java 8 Stream best implementation of functional programming which can be used easily with Java Collection Framework. Continue reading “Java 8 Streams By Examples”

4 Steps to Configure Hive with MySQL Metastore on CentOS

Prior to the configuration of Hive with MySQL metastore, let’s know some important things about Apache Hive and it’s metastore. Apache Hive Metastore is normally configured with Derby Database. But that setting is recommended just for the testing or ad-hoc development purpose. When hive is used in production, its metastore should be configured in databases like MySQL or Postgres. Continue reading “4 Steps to Configure Hive with MySQL Metastore on CentOS”

Apache Spark RDD Operations: Transformation and Action

We have already discussed about Spark RDD in my post Apache Spark RDD : The Bazics. In this post we’ll learn about Spark RDD Operations in detail. As we know Spark RDD is distributed collection of data and it supports two kind of operations on it Transformations and Actions. Continue reading “Apache Spark RDD Operations: Transformation and Action”

Apache Spark RDD : The Bazics

RDD stands for Resilient Distributed Dataset. Apache Spark RDD is an abstract representation of the data which is divided into the partitions and distributed across the cluster. If you are aware about collection framework in Java than you can consider an RDD same as the Java collection object but here it is divided into various small pieces (referred as partitions) and is distributed across multiple nodes. Continue reading “Apache Spark RDD : The Bazics”