This project is intended to provide an example of MLOps architecture. It uses the code of a Kaggle Notebook as use case example. The original code has been edited such to adapt it for the example in this project.
You can find the documentation about how each technology is used in the doc folder
In this project is used preferably Free Software (except for Google Cloud Build and Google Cloud Functions).
Use the following links to read the detailed documentation about how each technology is used in this project.
- MLFlow - Tracks the experiments log, the model versions and to store them in a Model Registry
- Kubeflow - Orchestrates the ML workflow
- BentoML - Used as serving framework
- Google Cloud Platform
- Google Cloud Build - Used to build a CI pipeline
- Google Cloud Functions - Used to run the Kubeflow pipeline whenever a file is added or updated in the bucket used for the training dataset.
As MLOps can really improve your ML lifecycle not all the possible benefits are met in this project and highlighted here.
This project shows the following advantages and challenges you can cope by using MLOps. Each item of the list is followed by the name (or the logo) of the technologies that address that challenge.
- Approach to ML as a process instead of only a product
- Reproduce the whole pipeline
- Reproduce the model building
- Automate the whole workflow
- Auto retrain
- Validate the model and the data (as steps of the pipeline)
- Data Drift
- Increase collaboration between teams
- Track the parameters used for the model training, the metrics and the model itself
- Version your model
- CI/CD + CT
This project is licensed under the GPLv3 Licence - see the LICENSE file for details. Any comment, feedback or suggestion will be appreciated