Repo to contain the assignments for DSCI 553: Foundations and Applications of Data Mining course at USC.
Instructor: Professor Wei-Min Shen (Spring 2023)
Follow these instructions to run the script locally and on Vocareum.
For additional details, look at the particular README of the homeworks individually.
Assignment | Topic | Implementation | Concepts | Dataset |
Homework 0 | Setting up development environment |
Python, Scala | Map-Reduce |
None |
Homework 1 | Data Exploration on Yelp Dataset |
Python | Map-Reduce |
Test, Full |
Homework 2 | Frequent Item-set Mining |
Python | SON Algorithm , Apriori Algorithm , Frequent Item-sets |
Simulated, Real-world |
Homework 3 | Locality Sensitive Hashing (LSH), Collaborative Filtering, Recommendation Systems |
Python | Min-Hashing , Locality Sensitive Hashing , Pearson Similarity , Model-based Recommendation System |
Training and Validation |
Homework 4 | Community Detection | Python | Girvan-Newman Algorithm , Label Propagation Algorithm |
Graph Data |
Homework 5 | Processing Data Streams | Python | Bloom Filter , Flajolet-Martin Algorithm , Reservoir Sampling |
Seed dataset for stream + Stream Generator |
Homework 6 | Clustering | Python | Bradley-Fayyad-Reina (BFR) Algorithm |
Synthetic dataset |
Competition Project | Recommendation System | Python | Recommendation Systems |
Same as homework 3 |