Tag Archives: Spark RDD

Apache Spark RDD : The Bazics

RDD stands for Resilient Distributed Dataset. Apache Spark RDD is an abstract representation of the data which is divided into the partitions and distributed across the cluster. If you are aware about collection framework in Java than you can consider an RDD same as the Java collection object but here it is divided into various small pieces (referred as partitions) and is distributed across multiple nodes.