Authors:
- Nicholas DeGroot (Halıcıoğlu Data Science Institute, UC San Diego)
This project was created for UCSD's DSC 180: Data Science Capstone. According to the university, the course:
Span(s) the entire lifecycle, including assessing the problem, learning domain knowledge, collecting/cleaning data, creating a model, addressing ethical issues, designing the system, analyzing the output, and presenting the results.
This project is configured with devcontainer
support. This automatically creates a fully isolated environment with all required dependencies installed.
The easiest way to get started with devcontainers
is through GitHub Codespaces.
- Click here to create a new codespace on this repository.
- Alternatively, this can be done through the
gh
CLI.
- Alternatively, this can be done through the
- Configure the codespace to your liking. We recommend the 8-core machine.
- Start the codespace and connect. It might take a minute to install all the dependencies. Grab a ☕!
- Connect to the codespace through your preferred method (browser / VS Code).
This project is setup with an array of tests using pytest
to ensure things are working. With a working environment, run the following command.
make test
For UCSD students & staff, we've ensured that everything works on the Data Science Machine Learning Platform servers.
The (auto!) published Docker image contains everything you need to test the project. Under the hood, it's running the same container that any devcontainer
is.
In DSMLP: log in with your credentials, then run the following:
launch.sh -s -i ghcr.io/nickthegroot/recipe-recommendation:main
cd /app
make test
This will begin a full run of every test in the project. Currently, this includes a full pipeline test and a smaller data processing test.
- Download the data by creating an Kaggle account and downloading the
shuyangli94/food-com-recipes-and-user-interactions
dataset. - Unzip the data into
data/raw
.- You should see a number of files, including
data/raw/RAW_interactions.csv
anddata/raw/RAW_recipes.csv
- You should see a number of files, including
- Run
make data
to clean the data into its cleaned form.
All models can be trained using python src/cli/train.py
.
- Run
python src/cli/train.py --help
for all configuration options - In general, all models can be trained via
python src/cli/train.py --model {model}
- For example,
LightGCN
is trained withpython src/cli/train.py --model LightGCN
- For example,