Skip to content

dhauss/big_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

big_data

Big Data course work Fall 2023

Assignment 1: Bigram counting and probability program in Hadoop MapReduce. See Assignment1 README for more details

Assignment 2: PySpark analytical exercises. See Assignment2 README for more details

ReservoirSampling: Reservoir sampling algorithm implemented in Hadoop MapReduce. See ReservoirSampling README for details

Midterm: Anomaly and Duplicate detection in PySpark. See linked READMEs for details

Assignment 4: MongoDB exercises, see markdown comments in notebook for details

About

Big Data course work Fall 2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published