- About project
- Technologies
- Software and account requirement
- Setup
- Project Architecture
- Project Pipeline
he purposes of this exercise to look into different features to observe their relationship, and plot a multiple linear regression based on several features of individual such as age, physical/family condition and location against their existing medical expense to be used for predicting future medical expenses of individuals that help medical insurance to make decision on charging the premium.
This project is created with below resources:
- Python: 3.7
- Machine Learning
- Jupyter Notebook
- Docker
- Git
- CI/CD Pipeline
- Azure
To install requirement file
pip install -r requirements.txt
- Add files to git
git add .
orgit add <file_name>
- To check the git status
git status
- To check all version maintained by git
git log
- To create version/commit all changes by git
git commit -m "message"
- To send version/changes to github
git push -u origin main
- Data Ingestion
- Data Validation
- Data Transformation
- Model Training
- Model Evaluation
- Model Deployement
- Data ingestion is the process in which unstructured data is extracted from one or multiple sources and then prepared for training machine learning models.
- Data validation is an integral part of ML pipeline. It is checking the quality of source data before training a new mode
- It focuses on checking that the statistics of the new data are as expected (e.g. feature distribution, number of categories, etc).
- Data transformation is the process of converting raw data into a format or structure that would be more suitable for model building.
- It is an imperative step in feature engineering that facilitates discovering insights.
- Model training in machine learning is the process in which a machine learning (ML) algorithm is fed with sufficient training data to learn from.
- Model evaluation is the process of using different evaluation metrics to understand a machine learning model’s performance, as well as its strengths and weaknesses.
- Model evaluation is important to assess the efficacy of a model during initial research phases, and it also plays a role in model monitoring.
- Deployment is the method by which we integrate a machine learning model into production environment to make practical business decisions based on data.