This repository showcases a comprehensive Machine Learning (ML) pipeline deployment leveraging cloud-native and DevOps tools. The architecture integrates AWS, Docker, MLflow, Jenkins, Grafana, Snowflake, and more to deliver an end-to-end ML workflow.
Scan the QR code above to access the deployed application instantly.
The pipeline automates the entire ML lifecycle, from data ingestion to model deployment and monitoring. Key technologies include:
- Containerization with Docker
- Continuous Integration/Continuous Deployment (CI/CD) using Jenkins
- Experiment Tracking via MLflow
- Monitoring through Grafana
Component | Description |
---|---|
AWS | Hosts various services including: • Amazon S3: Storage for datasets and ML models. • Snowflake: Data warehouse for structured data. |
MLflow | Manages the ML lifecycle: • Experiment tracking • Reproducibility • Model deployment |
Snowflake | Data warehouse facilitating seamless data access and storage within the pipeline. |
VS Code (Remote) | Development environment for writing and testing code, with remote access to Snowflake and MLflow. |
Jenkins | Automates CI/CD processes: • Runs tests • Builds Docker containers • Orchestrates model deployment |
Docker | Ensures consistent environments across development, testing, and production stages. |
Grafana | Monitors pipeline health and visualizes metrics like model performance, build status, and system health. |
Evidently | Provides tools for monitoring model performance and diagnosing issues in machine learning systems, with features for data drift, model quality, and target drift analysis. |
Railway APP | Deploys the final ML model and application, accessible via QR code or live demo |
-
📥 Data Access & Preprocessing
- Sources: Raw data is fetched from Snowflake and Amazon S3.
- Process: Data is preprocessed and feature engineering is performed to prepare for model training.
-
🏋️ Model Training
- Tracking: MLflow tracks experiment metadata, including parameters, metrics, and artifacts.
- Execution: Models are trained using the preprocessed data.
-
💾 Model Saving
- The trained model artifacts are stored back into Amazon S3 for future reference and deployment.
-
🔄 Continuous Integration & Deployment
- Jenkins automates the CI/CD pipeline:
- Testing: Runs automated tests to ensure code quality.
- Building: Containerizes the application using Docker.
- Deployment: Orchestrates the deployment of the model to production environments.
- Jenkins automates the CI/CD pipeline:
-
📈 Monitoring & Logging
- Grafana provides real-time monitoring of:
- Model performance
- Jenkins build statuses
- Docker container health
- Ensures quick identification and resolution of issues.
- Grafana provides real-time monitoring of:
-
🌐 Deployment on Railway
- The final model is deployed as a service via Railway APP.
- Users can interact with the ML model through a user-friendly interface.
git clone https://github.com/HuseynA28/awsMlopsFaceApp.git
cd awsMlopsFaceApp
- Configure Environment Variables Create a .env file in the root directory and populate it with your credentials.
📄 .env File
Copy code
# 🔒 Docker and Services
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_DEFAULT_REGION=your_aws_region
MYSQL_USER=mlflowuser
MYSQL_PASSWORD=mlflowpassword
MYSQL_DATABASE=mlflowdb
MYSQL_ROOT_PASSWORD=rootpassword
PGADMIN_DEFAULT_EMAIL=admin@example.com
PGADMIN_DEFAULT_PASSWORD=adminpassword
MLFLOW_S3_ENDPOINT_URL=your_mlflow_s3_endpoint
S3_MLFLOW_BUCKET=your_s3_bucket_name
POSTGRES_PASSWORD=postgrespassword
aws_access_key_id=your_aws_access_key
aws_secret_access_key=your_aws_secret_key
SNOWFLAKE_ACCOUNT=your_snowflake_account
SNOWFLAKE_USER=your_snowflake_user
SNOWFLAKE_PASSWORD=your_snowflake_password
SNOWFLAKE_SCHEMA=your_snowflake_schema
SNOWFLAKE_DATABASE=your_snowflake_database
SNOWFLAKE_ROLE=your_snowflake_role
SNOWFLAKE_WAREHOUSE=your_snowflake_warehouse
POSTGRES_PASSWORD=postgrespassword
📋 Explanation Section Description Docker and Services Credentials for AWS, MySQL, PGAdmin, MLflow S3, and PostgreSQL services used in Docker containers. Notebooks Credentials for accessing Snowflake and AWS services within Jupyter notebooks or other development environments. 3. Build and Run Containers Ensure Docker is installed and running on your machine.
Copy code
docker-compose up --build
4. Access Services
Jenkins: http://localhost:8080
MLflow: http://localhost:5000
Grafana: http://localhost:3000
VScode : http://localhost: 8081
Postgres : http://localhost: 5433
Adminer : http://localhost: 8080
🖥️ Development Environment Use VS Code with Remote Development extensions to interact with the pipeline seamlessly.
🔌 Connecting to Remote Development Open VS Code. Install the Remote Development extension pack. Connect to your remote environment where Snowflake and MLflow are accessible. 📈 Monitoring and Visualization Grafana provides insightful dashboards to monitor the pipeline's health and performance.
📊 Key Metrics Model Performance: Track accuracy, precision, recall, and other relevant metrics. Build Status: Monitor the status of Jenkins builds and deployments. System Health: Observe CPU, memory usage, and other system metrics.
🛡️ Continuous Integration & Deployment Jenkins automates the CI/CD pipeline ensuring efficient and reliable deployments.
🔌 Jenkins Ports
Service Port
Jenkins 8080
MLflow 5000
Grafana 3000
🚀 CI/CD Workflow
Code Commit: Push changes to the GitHub repository. Jenkins Trigger: Automatically triggers a new build. Testing: Runs automated tests to validate changes. Docker Build: Creates new Docker images for the updated application. Deployment: Deploys the new containers to the production environment. Monitoring: Grafana tracks the deployment's impact on system health and model performance. 📚 Experiment Tracking with MLflow MLflow manages the ML lifecycle, ensuring experiments are reproducible and models are deployable.
🔍 MLflow Features Experiment Tracking: Log parameters, metrics, and artifacts. Model Registry: Manage model versions and stages. Deployment: Seamlessly deploy models to various platforms. ☁️ Cloud Services Integration 🌩️ AWS Services Amazon S3: Central storage for datasets and model artifacts. Snowflake: Scalable data warehousing solution integrated with the pipeline. 🐳 Docker Containerizes the entire pipeline, ensuring consistency across development, testing, and production environments.
📦 Deployment on Railway Deploy the trained model as a service using Railway APP for easy accessibility.
🌐 Access the Deployed Application Live Demo: facerecognitionmachinelearning-production-f097.up.railway.app
QR Code:
📖 License This project is licensed under the MIT License.
🤝 Contributing Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
📧 Contact For any questions or feedback, feel free to reach out:
LinkedIn: Your LinkedIn Profile 🔗 Back to Top
🛠️ Technologies Used
markdown Copy code
-
Image URLs: Ensure that the image URLs (e.g., pipeline overview, QR codes, Grafana dashboard) are correct and publicly accessible. Replace the placeholder URLs (
https://github.com/user-attachments/assets/...
) with the actual paths to your images. -
Back to Top Link: The link
[Back to Top](#-ml-model-pipeline-deployment)
assumes that the main header has the ID#🧠-ml-model-pipeline-deployment
. If this doesn't work as expected, you may need to adjust the link to match the actual ID generated by your Markdown renderer. Alternatively, you can link to the top of the document using[Back to Top](#ml-model-pipeline-deployment)
or simply[Back to Top](#)
. -
Environment Variables: Make sure to never commit your
.env
file or expose sensitive credentials in your repository. Use tools like.gitignore
to exclude the.env
file from version control. -
Live Demo Link: Verify that the live demo link is correct and accessible. The current link points to a Railway app; ensure it's deployed and operational.
-
Contact Information: Update the placeholder email and LinkedIn profile links with your actual contact information.
Feel free to further customize the documentation to better fit your project's speci