This repository contains the code and data for my COS 781 Project, focusing on building a recommendation system. Below is an overview of the project structure, data flow, and usage instructions.
- Located in the
data
folder. - Excludes the raw data and intermediate data due to size, please contact me if you wish to have them
- Includes the preprocessed and application datasets.
- The workflow starts with the source data and progresses through intermediate, preprocessed, and final application stages.
The codebase is organized into the following key folders:
Contains three Jupyter notebooks:
source.ipynb
: Handles data loading and sampling.preprocess.ipynb
: Includes data subsetting, cleaning, exploratory data analysis (EDA), and utility matrix construction.clustering.ipynb
: Performs clustering analysis on the normalized utility matrix.
Contains two Jupyter notebooks for applying recommendation algorithms:
cosine_similarity.ipynb
: Implements cosine similarity on the utility matrix.matrix_factorization.ipynb
: Implements matrix factorization on the utility matrix.
- Contains styling functions and plot utilities to enhance visualizations and readability.
The env.yml
file specifies the Conda environment required to run the code. To set up the environment: