Project T Final for CS 189/289A: Introduction to Machine Learning @ UC Berkeley
We aim to guide students through the Collaborative Filtering approach to recommendation systems. We want to expose students to 2 paradigms for collaborative filtering: nearest neighbor-style heuristic searches and latent space models that tie into matrix decompositions studied in EECS 16B. The assignment will be a Jupyter Notebook focused on constructing recommendations for a specific dataset -- Netflix title recommendations being a classic example, as well as the MovieLens dataset. We'll start with the former paradigm connecting it to KNN; we will then move on to the models approach, which will address issues with sparsity and show how other techniques -- namely, matrix factorizations similar to Diagonalization/SVD seen in EECS 16AB -- can be used to approach this problem. Lastly, we'll touch upon common approaches for Collaborative Filtering in Industry (Surpriselib, Deep Learning, Regularization), and open problems (e.g. cold start issue).
- Apply knowledge of Pandas and Numpy to load and analyze novel datasets
- Utilize Pandas, Numpy, and Matplotlib to perform Exploratory Data Analysis (EDA) and understand the layout, information, and biases in a given dataset
- Explore cosine similarity as a measurement of likeness between high-dimensional feature vectors of users and movies
- Connect previous ideas of clustering to apply K-Nearest Neighbors towards grouping similar users or movies
- Draw connections to previous matrix factorizations learned in 16AB, and explore how to use gradient descent to learn latent space embeddings
- Use and appreciate packages used in industry (such as Surpriselib) for Collaborative Filtering
documentation
: Slides and notes on Collaborative Filtering- Google Slides Mirror: https://bit.ly/MA_Project_T_Final_Slides
- Slides Link
- Notes
code
: Jupyter Notebook assignment and solutions for Collaborative Filteringassessment
: Sample questions to assess student learning
- Maxwell Chen (@maxhchen)
- Abinav Routhu (@abinavcal)