Apache Spark map Example

In Apache Spark map example, we’ll learn about all ins and outs of map function. Basically map is defined in abstract class RDD in spark and it is a transformation kind of operation which means it is a lazy operation. Let’s explore it in detail.

Spark RDD map function returns a new RDD by applying a function to all elements of source RDD

Spark map itself is a transformation function which accepts a function as an argument. This function will be applied to the source RDD and eventually each elements of the source RDD and will create a new RDD as a resulting values. Let’s have a look at following image to understand it better.

Apache Spark map Example
Apache Spark map Example

As you can see in above image RDD X is the source RDD and RDD Y is a resulting RDD. If we recall our word count example in Spark, RDD X has the distributed array of the words, with the map transformation we are mapping each element with integer 1 and creating a tuple like (word, 1).

Important points to note are,

  • map is a transformation operation in Spark hence it is lazily evaluated
  • It is a narrow operation as it is not shuffling data from one partition to multiple partitions

Let’s take some examples,

Spark map Example Using Scala

Spark map Example Using Java 8

Above example is in form of full class of Java 8, as Java doesn’t have REPL in Spark.

PySpark map Example

Above are very basic examples we’ll see more such examples in my upcoming posts.


Leave a Reply

Notify of