The main objective of this project is to build a machine learning model using Azure Container Services.We have been provided with the banking dataset. The main steps of the project are:-
1) Authentication
2) Automated ML Experiment
3) Deploy the best model
4) Enable logging
5) Swagger Documentation
6) Consume model endpoints
7) Create and publish a pipeline
8) Documentation
An architectual diagram of the project and introduction of each step.
It is the vital step to ensure secure and authentic access. Authentication is required for the creation of the Service Principal account and associate it with specific workspace.
We create the new Automated ML Run Experiment and then upload the Bank Marketing dataset. We run the experiment configuring a new compute cluster, using Classification and ensure that the best model explaination is enabled.
After the completion of the AutoML Run, we will get our best model. We will then deploy that model using the Azure Container Instance(ACI) and enable Authentication to prevent unauthorized access.
After the deployment, we will enable the Application Insights from the deployed model. This will help us produce logging output with the python sdk. It plays a vital role to debug problems in production environments.
Swagger helps us build, document and consume RESTful web services. It also explains what type of requests API can consume like POST and GET.
We must consume the deployed service to retrieve the data using HTTP request. It will help us in validation of data by identifying if anything is having any problem or is incorrect.
The last and the most vital step is to make the model publically available. This is done by creating a pipeline and then publishing it. It is synonymous to Automation as the pipeline create ways for other services to interact with it using HTTP endpoint.
- We have to first register the dataset from the local files.
- We have to build a compute instance of type DS12_V2 for running the AutoML Run.
- Maximum number of nodes are 5 and min number of nodes are 5.
- We have to run an AutoML using the same registered Dataset.
- We have to mention the same compute instance which we build earlier.
- After running the AutoML we need to collect the best model from various diffrent models.
- Here we got voting ensemble model which chooses voting model to choose the best of several runs. The base model is XGBOOST with Maxabs scaling and accuracy of 91%
- After the experiment run completes, a summary of all the models and their metrics are shown, including explanations. Below images show the explaination of the best performing model.
- Once we Have the best model its time to deploy the model. We can use azure Kubernetes service or azure container instane for the deployment.
- We need to choose authenticate method during the deployment method. Once deployment is succeded an endpoint will be created with status showing as healthy in workspace
-
Once the model is deployed we need to enable the logs setting the appinsights = True in the Experiment logging section by adding the experiment name.
-
Once we have enabled the logging we should see the status in application insights saying the failed requests, timed out requests etc.
-
We can consume this endpoint using REST API or by running Azure ML python SDK's.
-
Swagger is one of the API tetsing platforms available .
-
Once the model is deployed we get a Swagger JSon file from the endpoint which needs to be downloaded and placed in the folder containing swagger files serve.py and swagger.sh.
-
After that we need to launch a local web server using serve script and lauch swagger using docker container by running swagger.sh
We can schedule the pipelines using schdeule recurrence parameter reducing the manual efforts.
Check out our CONTRIBUTING GUIDELINES
See project in action HERE🖼️
- Collecting more data can definitely help in improving accuracy.
- We can try testing the batch data in a schedule and see the performance.
- We can try to implement various new algorithms along with running the AutoML Experiment for a longer time period.
- We can try new values for number of nodes, concurrency etc.
- We can use GPU's instead of CPU's to improve the performance. Since CPU's might reduce the costs but in terms of performance and accuracy GPU's outperform CPU's.
This project is licensed under the MIT LICENSE