Spark groupBy example can also be compared with groupBy of SQL. In spark, groupBy is a transformation operation. Let’s have some overview first then we’ll understand this operation by some examples in Scala, Java and Python languages.
Spark flatMap example is mostly similar operation with RDD map operation. It is also defined in RDD abstract class of spark core library and same as map it also is a transformation kind of operation hence it is lazily evaluated.
We have discussed a high level view of YARN Architecture in my post on Understanding Hadoop 2.x Architecture but YARN it self is a wider subject to understand. Keeping that in mind, we’ll about discuss YARN Architecture, it’s components and advantages in this post.
Prior to learn the concepts of Hadoop 2.x Architecture, I strongly recommend you to refer the my post on Hadoop Core Components, internals of Hadoop 1.x Architecture and its limitations. It will give you the idea about Hadoop2 Architecture requirement. And we have already learnt about the basic Hadoop components like Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker.
Interfaces went through major changes while redesign in Java 8 and as a part of that Java 8 default methods in Interface are the major addition. Java 8 interfaces also supports static methods along with default and abstract methods. Let’s dig into the more details to know the facts about default methods in interfaces.
Prior to the configuration of Hive with MySQL metastore, let’s know some important things about Apache Hive and it’s metastore. Apache Hive Metastore is normally configured with Derby Database. But that setting is recommended just for the testing or ad-hoc development purpose. When hive is used in production, its metastore should be configured in databases like MySQL or Postgres.