Model Evaluation is the process through which we quantify the quality of a system’s predictions. To do this, we measure the newly trained model performance on a new and independent dataset. This model will compare labeled data with it’s own predictions. Model evaluation performance metrics teach us:
- How well our model is performing
- Is our model accurate enough to put into production
- Will a larger training set improve my model’s performance?
- Is my model under-fitting or over-fitting?
The Goal of the project is to come up with a solution which has context based visualisation capability and helps data scientist in comparing relevant metrics of the different machine learning model easily and evaluate its performance on unseen data.
This project is focused on evaluating several machine learning models, given a model and a testing dataset.It also provides capability to compare model evaluations on the basis of different metrics. However comparison of model evaluation is only possible for following 2 scenarios
- Evaluation reports for one model against different validation datasets(having same schema)
- Evaluation report for multiple models (2 or more) generated against same validation dataset
Columns | Data Type |
---|---|
Model ID | integer |
Name | String |
Metadata | Object |
Model path | string |
Date created | Date |
Columns | Data Type |
---|---|
Dataset ID | integer |
Name | String |
Metadata | Object |
Dataset path | string |
Date created | Date |
Columns | Data Type |
---|---|
Evaluation ID | integer |
Name | String |
Metadata | Object |
Model ID | integer |
Dataset ID | integer |
To Run the application using method 1 , make sure proxy is set to "http://localhost:5000/" in client/package.json file
To Run the application using method 2 , make sure proxy is set to "http://api:5000/" in client/package.json file
- Clone the repository (
git clone https://github.com/aks2507/Model-Evaluation-and-Diagnosis-Display.git
) - Change directory into the cloned repo(
cd .\Model-Evaluation-and-Diagnosis-Display\
) - Change directory to the client folder(
cd client
) - Install the neccessary dependencies(
yarn install
) - Run the Flask server at port 5000(
yarn start-api
) - Open another terminal/cmd window and run the client side at port 3000(
yarn start
)
Your application is up and running and to see that , head over to http://localhost:3000
Application is dockerized. So , Follow the given steps to run the application using docker.
Firstly, download Docker desktop and follow its instructions to install it. This allows us to start using Docker containers.
Create a local copy of this repository and run
docker-compose build
This spins up Compose and builds a local development environment according to our specifications in docker-compose.yml.
After the containers have been built (this may take a few minutes), run
docker-compose up
This one command boots up a local server for Flask (on port 5000) and React (on port 3000). Head over to
http://localhost:3000/
to view an incredibly overwhelming React webpage triggering REST API call to our Flask server.
The API endpoints can be tweaked easily in api/app.py. The front-end logic for consuming our API is contained in client/src/. The code contained within these files simply exists to demonstrate how our front-end might consume our back-end API.
Finally, to gracefully stop running our local servers, you can run
docker-compose down
in a separate terminal window or press control + C.
- Clone the repo by
git clone https://github.com/aks2507/Model-Evaluation-and-Diagnosis-Display.git
cd .\Model-Evaluation-and-Diagnosis-Display\
cd client
yarn install
yarn start-api
- Then go to
http://127.0.0.1:5000/swagger
Alternatively , you can just visit https://ksrrock.github.io/swagger-ui/ to visualise the REST endpoints.
Train your model in Jupyter Notebook/Kaggle/GoogleColab and obtain the serialised model and dataset as follows:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
df1=pd.DataFrame(X_test)
df2=pd.DataFrame(y_test)
result=pd.concat([df1,df2],axis=1)
#Saving the Testing Data used for evaluation
result.to_csv('test_data_DecisionTree.csv',index=False)
#Training the model
from sklearn.linear_model import LogisticRegression
LR=LogisticRegression(solver='lbfgs', max_iter=10000)
LR.fit(X_train,y_train)
filename = 'finalized_model.sav'
#Obtaining the Serialised Model
pickle.dump(LR, open(filename, 'wb'))
Clone the repo and run the command yarn install
to install the neccessary dependencies
Run the code on your local machine (you can either run the application via docker or you can also run client side and server side individually).
If you want to run the application via docker , make sure the proxy set in client/package.json is http://api:5000/
.
If you want to run client side and server side individually , change the proxy to http://localhost:5000
and follow the instructions ahead.
The api folder is included in this repository itself. To spin up the server at http://localhost:5000 , open a terminal, navigate to the client folder inside the folder where you have cloned this repository, and run:
In the project directory, go inside the client folder and to spin up the client at http://localhost:3000 , run:
As soon as both the server and client are up and running, you will be able to surf through the site and call API endpoints. The React-frontend has a proxy setup to port 5000. That way, the urls that are not recognised on port 3000, get redirected to port 5000, thus invoking the endpoints, if they have been defined in the backend.
While the server side is running, populate your models and datasets using POSTMAN triggering the following API endpoints:
-
There are two types of tables in the project
- Interactive
- Can toggle padding
- Change number of rows per page
- Navigate between table pages
- Checkbox to select on or multiple rows
- Delete button to delete slected row(s)
- Sort according to column
- Non-Interactive: Plain simple table to display data
- Semi-Interactive: Same as interactive, except following are removed:
- Delete option
- Checkboxes to select
- Interactive
-
Only plots that are shown in single evaluation have a slider. They have not been added in comparisons, considering their lack of utility in such a case.
-
Datasets and models are currently registered using either Postman or Swagger UI. Evauluations can be registered using the UI.
-
Comaprison is only possible in two cases:
- Multiple models trained on the same dataset
- Same model tested on multiple datasets
-
Only models provided by the Scikit-Learn Library are supported
-
Model files are unpickled and used. So only one extension, .sav, is supported
The Homepage consists of a table where all the evaluations are listed. The table is fully interactive. By clicking on the EvaluationID, the user can see the visualizations related to it. The Compare button can be used to compare two or more evaluations. Refer to the General information section above to see how these work. Clicking on the Evaluation ID triggers the model evaluation, if the metrics are not there already. Each row has the following information:
- Evaluation ID
- Evaluation name
- Model Type
- Model
- Dataset
- Date Created
This page contains the form that helps the user to register an evaluation. The user enters the following information here:
- Evaluation name
- Model Type(selection)
- Dataset(selection)
- Model(selection)
- Description(optional)
On submitting, the evaluation gets stored in the table, without the metadata.
The evaluation metrics for a single model can be visualized by clicking on the button encircling the Evaluation ID of the evaluation in the table at Homepage. It essentially sends a get request for the evaluation, and based on the received payload, It will render the visualisations as follows:
Note: All tables rendered in this scenario are semi-interactive, except the table for feature importance.
Following evaluation metrics will be visible to the user the user in tabular, bar chart and line chart format:
Classification | Regression |
---|---|
Accuracy | MAE |
Precision | MSE |
Recall | RMSE |
F1-Score | RMSLE |
Log-Loss | R-squared |
---- | Adjusted R-squared |
The following curves and charts will be shown to the user when they select this option:
Classification | Regression |
---|---|
ROC | Residual vs Observed |
Precision-Recall | Observed vs Predicted |
Confusion Matrix | Residual vs Predicted |
Gain and Lift Charts | --- |
Along with the plots, there are also several ways to interact with them, by:
- Zooming in and out using scrolling
- Select
- Lasso select
- Slider of cutoff value
- Button to reset to initial state
- Drag and move
- Save as PNG
The following data is shown about the test dataset used by the model for the prediction:
- Dataset Statistics are shown in Tabular format as well as in Line Chart format. Following are the statistics displayed in it, for each column of the dataset:
- Mean
- Standard deviation
- Minimum value
- Maximum value
- First Quartile
- Second Quartile
- Third Quartile
- IQR
- Number of missing values
- Feature Importances are shown in Bar chart as well as tabular format
- Class Imbalence is shown in Pie-chart format
This section gives a tabular view of the parameters and attributes that are associated with the trained model, in a tabular format.
Each of the above tabs will have a Details tab, that gives information about the evaluation in general, some information about the dataset and the model used in the evaluation.
Here, the user can persist additional metrics calculated externally, in the database. Clicking on the '+' button at the bottom right creates a dialog box with a form that handles the addition of the key value pairs of additional custom metrics. They are shown along with the principal metrics, side by side, in a table format.
For both regression and classification, there are five types of components rendered:
- Metrics
- Dataset Information
- Model information
- Curves
- Details(part of each tab panel)
An semi-interactive table, along with both bar graph and line charts are rendered in this tab. Metrics are the same as put up in the above section on Single Evaluation. The table can be sorted by metrices to compare the models.
Since the evaluations being considered in this case must have the same dataset, the same component that was used for single evaluation use case has been used.
Multiple tables listing out parameters and attributes of each model are rendered.
The plots mentioned in the above section are rendered, with the traces of other models in the same graph, with the exception of Gain and Lift charts in case of Binary Classification.
Every tab panel has it. Its the same as single evaluation, except now, it has tabs for all evaluations that were selected by the user.
For both regression and classification, there are five types of components rendered:
- Metrics
- Dataset Information
- Model information
- Curves
- Details(part of each tab panel)
An semi-interactive table, along with both bar graph and line charts are rendered in this tab. Metrics are the same as put up in the above section on Single Evaluation. The table can be sorted by metrices to compare the datasets.
The The dataset statistics for all datasets are shown tab wise. User can switch between statistics and compare the datasets based on those statistics. The use can also switch between tabular view and Line Chart view. Along with this, it also contains information about the feature importances of the datasets in chart and tabular format, and the class imbalence.
Since the model being used is same, there is a single table listing out all the parameters and attributes of the trained model.
The plots mentioned in the 'Single Model Evaluation' section are rendered, with the traces of other datasets in the same graph, or in subplots, with the exception of Gain and Lift charts in case of Binary Classification.
Every tab panel has it. Its the same as single evaluation, except now, it has tabs for all evaluations that were selected by the user.