Spark combineByKey
RDD transformation is very similar to combiner in Hadoop MapReduce programming. In this post, we’ll discuss spark combineByKey example in depth and try to understand the importance of this function in detail. Continue reading “Apache Spark combineByKey Example”
Tag: Transformations
Apache Spark aggregateByKey Example
In this Spark aggregateByKey example post, we will discover how aggregationByKey could be a better alternative of groupByKey transformation when aggregation operation is involved. The most common problem while working with key-value pairs is grouping of values and aggregating them with respect to a common key. And Spark aggregateByKey transformation decently addresses this problem in a very intuitive way.
Apache Spark groupByKey Example
Apache Spark groupByKey example is quite similar as reduceByKey. It is again a transformation operation and also a wider operation because it demands data shuffle. Looking at spark groupByKey function it takes key-value pair (K,V) as an input produces RDD with key and list of values. Let’s try to understand the function in detail. At the end of this post we’ll also compare it with reduceByKey with respect to optimization technique.
Apache Spark groupBy Example
Spark groupBy example can also be compared with groupby clause of SQL. In spark, groupBy is a transformation operation. Let’s have some overview first then we’ll understand this operation by some examples in Scala, Java and Python languages. Continue reading “Apache Spark groupBy Example”
Apache Spark reduceByKey Example
Looking at spark reduceByKey example, we can say that reduceByKey is one step ahead then reduce function in Spark with the contradiction that it is a transformation operation. Let’s understand this operation by some examples in Scala, Java and Python languages. Continue reading “Apache Spark reduceByKey Example”
Apache Spark filter Example
In spark filter example, we’ll explore filter method of Spark RDD class in all of three languages Scala, Java and Python. Spark filter operation is a transformation kind of operation so its evaluation is lazy. Let’s dig a bit deeper. Continue reading “Apache Spark filter Example”
Apache Spark flatMap Example
Spark flatMap example is mostly similar operation with RDD map operation. It is also defined in RDD abstract class of spark core library and same as map it also is a transformation kind of operation hence it is lazily evaluated. Continue reading “Apache Spark flatMap Example”
Apache Spark map Example
In Apache Spark map example, we’ll learn about all ins and outs of map function. Basically map is defined in abstract class RDD in spark and it is a transformation kind of operation which means it is a lazy operation. Let’s explore it in detail. Continue reading “Apache Spark map Example”
Apache Spark RDD Operations: Transformation and Action
We have already discussed about Spark RDD in my post Apache Spark RDD : The Bazics. In this post we’ll learn about Spark RDD Operations in detail. As we know Spark RDD is distributed collection of data and it supports two kind of operations on it Transformations and Actions. Continue reading “Apache Spark RDD Operations: Transformation and Action”