MLapp

A web application to serve model based on Fastai v1 library. Application is built on Molten Framework due to it's inbuilt support for modern tooling: Swagger, OpenAPI, Prometheus etc.

Motivation

I wanted to have an application which allow me to serve ML models in real production settings. Application should provide:

Versioned APIs
Metrics (prometheus)
Ablility to run different version of models without unnecassary code changes
User friendly API documentations
Input data validation
Authentication
Inference device flexibility (CPU or GPU)
Scale up or down instances based on incoming traffic (Kubernetes)

So that I can focus on my ML model improvment and don't need to bother about how to serve the model in produciton.

Testing

To test the current model, just run the following commands

Virtualenv

Run the following commands to set things up. You will need to have python 3.6 or above and git-lfs package installed.

git clone --depth 1 https://github.com/gurvindersingh/mlapp.git
cd mlapp
virtualenv .ve
source .ve/bin/activate
pip install -r requirements.txt
pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
gunicorn app:app

Now you can visit localhost:8000 to see Swagger UI and test APIs out. Or you can run curl as

curl -X POST "http://localhost:8000/v1/predict" -F "file=@loadtest/keeshond.jpg"

You can change the config.json based on your own model, classes and architecture to see predictions from your own model. Make sure to have your .pth located with correct name under models directory.

Docker

CPU: docker run -ti --rm -p 8000:8000 gurvin/mlapp:v0.1.0-cpu
GPU & CPU: docker run -ti --rm -p 8000:8000 gurvin/mlapp:v0.1.0

Configuration

Application uses config.json to provide various options.

{
    "model_name": "v1",     -> Specify the model from multiple models in your config as which one you want to run with this instance
    "token": "",            -> To enable authenticaiton to you APIs
    "v1": {                 -> Model name and its corresponding config. These can be different based on your model
        "device": "cpu",    -> Device on which to run inference (cpu, cuda)
        "url": "",          -> URL to fetch the weights file .pth from. If empty app assumes local file under models dir
        "arch": "resnet34", -> Model Architecutre
        "size": 224,        -> Image size
        "classes": []       -> Classes used during training from which prediciton will happen
    }
}

Design

Application design is simple. app.py set up the routes and required methods to get data in and out. Following routes are exposed by application.

/               -> Returns health information and serve as liveness check for application
/_docs          -> Serves Swagger UI to provide API docs and user friendly API testing
/_schema        -> json formatted API spec in OpenAPI standard
/metrics        -> Endpoint to scrape metrics by Prometheus
/v1/predict     -> Receives end user data to make prediction on
/v1/feedback    -> User can provide feedback on prediction if there was any errors

API is versioned which makes evolution of our application possible without introducing any breaking changes to downstream applications. app.py validates data according to schema descibed in model.py file using Molten framework. The current application provides a simple CNN model which was described in Lesson 1 of Fast v1 course using schema shown below for predict endpoint. For more details see model.py or Molten docs

@schema
class ModelData():
    file: UploadedFile

Idea is that based on a given model, you can update the schema in this class and let Molten take care of validation for you and provide a user friendly docs and test interface as shown.

To test API either using curl or swagger UI can be used.

Metrics

Application uses Molten support for Prometheus to export request count, latency, GC Time and few other metrics.

Deployment

Kubernetes deployment file is provided in deployment folder which also have support for HorizontalPodAutoscalar to scale the application based on CPU usage and later scale down when not needed.

Load testing

TODO

Add support for ONNX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MLapp

Motivation

Testing

Virtualenv

Docker

Configuration

Design

Metrics

Deployment

Load testing

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

MLapp

Motivation

Testing

Virtualenv

Docker

Configuration

Design

Metrics

Deployment

Load testing

TODO