Big Data course work Fall 2023
Assignment 1: Bigram counting and probability program in Hadoop MapReduce. See Assignment1 README for more details
Assignment 2: PySpark analytical exercises. See Assignment2 README for more details
ReservoirSampling: Reservoir sampling algorithm implemented in Hadoop MapReduce. See ReservoirSampling README for details
Midterm: Anomaly and Duplicate detection in PySpark. See linked READMEs for details
Assignment 4: MongoDB exercises, see markdown comments in notebook for details