Skip to content

Latest commit

 

History

History
10 lines (7 loc) · 451 Bytes

learning-spark-internals-using-groupby.adoc

File metadata and controls

10 lines (7 loc) · 451 Bytes

Learning Spark internals using groupBy (to cause shuffle)

Execute the following operation and explain transformations, actions, jobs, stages, tasks, partitions using spark-shell and web UI.

sc.parallelize(0 to 999, 50).zipWithIndex.groupBy(_._1 / 10).collect

You may also make it a little bit heavier with explaining data distribution over cluster and go over the concepts of drivers, masters, workers, and executors.