Python NumPy Tutorial: Creating NumPy Arrays for Beginners

If you are a beginner in Data Analytics or Data Science field, you must have in depth understanding of numpy package of python. This basic python numpy tutorial will give you a clear idea about creating numpy array. This post could be the one stop for any data enthusiast searching for best numpy tutorial.

Continue reading “Python NumPy Tutorial: Creating NumPy Arrays for Beginners”

Apache Spark aggregateByKey Example

In this Spark aggregateByKey example post, we will discover how aggregationByKey could be a better alternative of groupByKey transformation when aggregation operation is involved. The most common problem while working with key-value pairs is grouping of values and aggregating them with respect to a common key. And Spark aggregateByKey transformation decently addresses this problem in a very intuitive way.

Continue reading “Apache Spark aggregateByKey Example”

Apache Spark groupByKey Example

Apache Spark groupByKey example is quite similar as reduceByKey. It is again a transformation operation and also a wider operation because it demands data shuffle. Looking at spark groupByKey function it takes key-value pair (K,V) as an input produces RDD with key and list of values. Let’s try to understand the function in detail. At the end of this post we’ll also compare it with reduceByKey with respect to optimization technique.

Continue reading “Apache Spark groupByKey Example”