Spark groupBy example can also be compared with groupBy of SQL. In spark, groupBy is a transformation operation. Let’s have some overview first then we’ll understand this operation by some examples in Scala, Java and Python languages.
Spark flatMap example is mostly similar operation with RDD map operation. It is also defined in RDD abstract class of spark core library and same as map it also is a transformation kind of operation hence it is lazily evaluated.
We have already discussed about Spark RDD in my post Apache Spark RDD : The Bazics. In this post we’ll learn about Spark RDD Operations in detail. As we know Spark RDD is distributed collection of data and it supports two kind of operations on it Transformations and Actions.
RDD stands for Resilient Distributed Dataset. Apache Spark RDD is an abstract representation of the data which is divided into the partitions and distributed across the cluster. If you are aware about collection framework in Java than you can consider an RDD same as the Java collection object but here it is divided into various small pieces (referred as partitions) and is distributed across multiple nodes.