API Docs: http://0.0.0.0:8080/docs
The above is an overview system design for online ingerencing for a simple text-classification system. Although this system doesn't handle the text preprocessing and feature generation pipeline itself.
Here, the system is divided into 3 parts
-
Model Training
- Local Jupyter Notebook
- AWS Sagemaker Training Job
-
Model Monitoring and Model registry
- Here the accuracy metrcis are monitored during development and post production as well
- Multiple hyper-paramater optimization is also taken carefof (Hyperopt; native for Flair)
- Here Weight & Biases is considered for mode monitoring and model registry since they provide a hosted solution
- Instaed, open-source and self-managed MLFlow will also serve the same purpose here with same functionalities
-
Model CI/CD
- AWS CodePipeline is used for orchestrating model deployment
- AWS ECr is considered for docker registry as the deployment target is either EC2 or AWS ECS
-
Model Inferencing
- For online model inferencing, the model is containerised and pushed to AWS ECR for easier container registry
- The primary deployment is on either bare metal AWS EC2 or AWS ECS.
- The basic implementation here is using
async
notation for asynchronous inferencing
-
Notification
- First part of notification is during development
- Second part is post production for monitoring
- 2 notification channel is considered: a) MS Teams b) Email. Both via webhook
There are multiple pre-checks which is not considered, such as
- Bias in the data; Checking for integrity
- Data pre-processing pipeline for model inferencing
- Feature store intgeration
- Vector Database integration during inferencing
Build the docker image from the root directory.
docker build .
Or pull the docker image from AWS ECR
docker run --name <container_name> <image_name> docker run --name <container_name> <image_name>