Welcome to this MLOps project, designed to demonstrate a robust pipeline for managing vehicle insurance data. This project aims to impress recruiters and visitors by showcasing the various tools, techniques, services, and features that go into building and deploying a machine learning pipeline for real-world data management. Follow along to learn about project setup, data processing, model deployment, and CI/CD automation!
Start by executing the template.py file to create the initial project template, which includes the required folder structure and placeholder files.
Write the setup for importing local packages in setup.py and pyproject.toml files.
Create a virtual environment and install required dependencies from requirements.txt:
conda create -n vehicle python=3.10 -y conda activate vehicle pip install -r requirements.txt
Verify the local packages by running:
pip list
Sign up for MongoDB Atlas and create a new project. Set up a free M0 cluster, configure the username and password, and allow access from any IP address (0.0.0.0/0). Retrieve the MongoDB connection string for Python and save it (replace with your password).
Create a folder named notebook, add the dataset, and create a notebook file mongoDB_demo.ipynb. Use the notebook to push data to the MongoDB database. Verify the data in MongoDB Atlas under Database > Browse Collections.
Create logging and exception handling modules. Test them on a demo file demo.py.
Analyze and engineer features in the EDA and Feature Engg notebook for further processing in the pipeline.
Define MongoDB connection functions in configuration.mongo_db_connections.py.
Develop data ingestion components in the data_access and components.data_ingestion.py files to fetch and transform data.
Update entity/config_entity.py and entity/artifact_entity.py with relevant ingestion configurations.
Run demo.py after setting up MongoDB connection as an environment variable.
Set MongoDB URL:
# For Bash export MONGODB_URL="mongodb+srv://<username>:<password>...."
# For Powershell $env:MONGODB_URL = "mongodb+srv://<username>:<password>...."
Define schema in config.schema.yaml and implement data validation functions in utils.main_utils.py.
Implement data transformation logic in components.data_transformation.py and create estimator.py in the entity folder.
Define and implement model training steps in components.model_trainer.py using code from estimator.py.
Log in to the AWS console, create an IAM user, and grant AdministratorAccess.
Set AWS credentials as environment variables.
# For Bash export AWS_ACCESS_KEY_ID="YOUR_AWS_ACCESS_KEY_ID" export AWS_SECRET_ACCESS_KEY="YOUR_AWS_SECRET_ACCESS_KEY"
Configure S3 Bucket and add access keys in constants.__init__.py.
Create an S3 bucket named my-model-mlproj in the us-east-1 region.
Develop code to push/pull models to/from the S3 bucket in src.aws_storage and entity/s3_estimator.py.
Implement model evaluation and deployment components.
Create Prediction Pipeline and set up app.py for API integration.
Add static and template directories for web UI.
Create Dockerfile and .dockerignore.
Set up GitHub Actions with AWS authentication by creating secrets in GitHub for:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
ECR_REPO
Set up an EC2 instance for deployment. Install Docker on the EC2 machine. Connect EC2 as a self-hosted runner on GitHub.
Open the 5080 port on the EC2 instance. Access the deployed app by visiting http://<public_ip>:5080.
GitHub Secrets: Manage secrets for secure CI/CD pipelines.
Data Ingestion ➔ Data Validation ➔ Data Transformation Model Training ➔ Model Evaluation ➔ Model Deployment CI/CD Automation with GitHub Actions, Docker, AWS EC2, and ECR
This README provides a structured walkthrough of the MLOps project, showcasing the end-to-end pipeline, cloud integration, CI/CD setup, and robust data handling capabilities.