A modular, scalable machine learning recommendation system built with FastAPI microservices, Docker containers, Kubernetes orchestration, AWS S3 for model storage, and ClearML for experiment tracking.
This project demonstrates the design, deployment, and management of a real-time ML recommendation system using MLOps principles. It includes:
- Collaborative Filtering
- Hybrid Recommendations
- Popularity-Based Recommendations
- Input Handling Service
- Automated Model Update Service
Each service is containerized and deployed as a microservice within a Kubernetes cluster.
- FastAPI-based APIs for each recommendation module
- Asynchronous, modular architecture
- Automated model retraining via Kubernetes CronJob
- Experiment tracking using ClearML
- Cloud storage and model persistence via AWS S3
- Kubernetes deployment-ready YAML files
- Postman-compatible API testing
- CI/CD pipeline support (planned for future)
Each pod is responsible for a specific service and interacts with S3 to load or update models. The model update job runs on a scheduled CronJob and logs metrics via ClearML.
| Component | Tool/Framework |
|---|---|
| Web Framework | FastAPI |
| Containerization | Docker |
| Orchestration | Kubernetes (Minikube) |
| Storage | AWS S3 |
| Experiment Tracking | ClearML |
| Automation | Kubernetes CronJob |
| Development | Python, Jupyter |
Uses matrix factorization techniques (like ALS or SVD) to learn latent user and item features based on past user-item interactions. This approach helps personalize recommendations by identifying similar behavior patterns across users.
Combines collaborative filtering with content-based features like category, brand, and item metadata. This balances personalization with coverage and is especially helpful in cold-start scenarios.
Provides fallback recommendations for new or anonymous users by ranking items globally based on popularity (e.g., top-selling products). This module ensures a response is always available even without user history.
.gitignore: Specifies files and directories to be ignored by Git.data.ipynb: A Jupyter notebook for data exploration or preprocessing.model_query.py: A Python script for querying models.readme.md: This file, providing an overview of the project.Rohan_v2.py: A Python script, likely related to a specific functionality or experiment.s3_data.py: A script for interacting with AWS S3.setup.py: A setup script for packaging the project.
Contains files related to collaborative filtering models.
api_query_example.py: Example script for querying the API.app/: Contains the FastAPI application code for collaborative filtering.build.sh: A script to build and run the Docker container for the collaborative filtering app.collab_model.ipynb: Jupyter notebook for training or analyzing the collaborative filtering model.collab_model_retrained.ipynb: Jupyter notebook for retraining the collaborative filtering model.Dockerfile: Dockerfile for building the collaborative filtering app container.fake.py: A placeholder or utility script.k8s/: Kubernetes configuration files for deploying the collaborative filtering app.requirements.txt: Python dependencies for the collaborative filtering app.
Contains files for hybrid recommendation models.
app/: Contains the FastAPI application code for hybrid models.Dockerfile: Dockerfile for building the hybrid model app container.requirements.txt: Python dependencies for the hybrid model app.
Contains files for input processing.
app/: Contains the FastAPI application code for input processing.Dockerfile: Dockerfile for building the input processing app container.requirements.txt: Python dependencies for the input processing app.
Contains files for popularity-based recommendation models.
app/: Contains the FastAPI application code for popularity-based recommendations.Dockerfile: Dockerfile for building the popularity-based app container.requirements.txt: Python dependencies for the popularity-based app.
Contains files for updating models.
app/: Contains the application code for model updates.build.sh: A script to build and run the Docker container for the model update app.Dockerfile: Dockerfile for building the model update app container.
Contains Kubernetes deployment configurations.
colab-deployment.yaml: Deployment configuration for the collaborative filtering app.hybrid-deployment.yaml: Deployment configuration for the hybrid model app.input-deployment.yaml: Deployment configuration for the input processing app.popularity-deployment.yaml: Deployment configuration for the popularity-based app.update-deployment.yaml: Deployment configuration for the model update app.
For services that include a build.sh script:
cd colab/
./build.shOr build and run manually using Docker:
docker build -t service-name .
docker run -p 8000:8000 service-nameReplace
service-namewith the appropriate folder (e.g.,colab,hybrid, etc.)
Make sure your Kubernetes cluster (e.g., Minikube) is running. Then apply the deployment YAML files:
kubectl apply -f k8s/colab-deployment.yaml
kubectl apply -f k8s/hybrid-deployment.yaml
kubectl apply -f k8s/input-deployment.yaml
kubectl apply -f k8s/popularity-deployment.yaml
kubectl apply -f k8s/update-deployment.yamlEach service will be exposed within the cluster and available for internal routing or testing via tools like Postman.
Each microservice has a requirements.txt file. To install dependencies:
pip install -r requirements.txtYou may want to use a virtual environment for dependency isolation.
After running any service (e.g., on port 8000), open:
http://localhost:8000/docs
to view and test the API endpoints interactively.
Model retraining is scheduled via Kubernetes CronJobs, but you can manually trigger it by re-applying the job file or running:
kubectl create job --from=cronjob/model-update-job model-update-job-manualThis will:
- Fetch new data
- Retrain the collaborative filtering model
- Upload the model to S3
- Log the run in ClearML
- Fast API response time: ~186ms average latency
- Improved category diversity and recommendation quality post-retraining
- Reliable performance thanks to container orchestration and automation
- Cold-start problem for new users remains a challenge
- No frontend interface yet — future work includes web-based UI
- CI/CD integration (e.g., GitHub Actions) planned for automation
- Load testing under high concurrency is yet to be done
- Rohan Jain
- Hsing-Hao Wang
- Madhur Lakshmanan
- Joshua Liu