A comprehensive MLOps pipeline for managing the complete lifecycle of machine learning models, including data processing, model training, monitoring, and deployment.
flowchart TD
%% Orchestration Layer
subgraph "Orchestration Layer"
direction TB
MainEntrypoint["Main Entrypoint"]:::cli
CLI["CLI Orchestrator"]:::cli
MainEntrypoint -->|"invokes"| CLI
RunData["Data Pipeline Launcher"]:::cli
RunModel["Model Pipeline Launcher"]:::cli
RunMonitor["Monitoring Pipeline Launcher"]:::cli
RunAPI["API Server Launcher"]:::cli
CLI -->|"process-data config.json"| RunData
CLI -->|"train-model config.json"| RunModel
CLI -->|"monitor-model config.json"| RunMonitor
CLI -->|"serve-model config.json"| RunAPI
end
%% Pipelines
subgraph "Pipelines"
direction TB
DataPipeline["Data Pipeline"]:::pipeline
ModelPipeline["Model Pipeline"]:::pipeline
MonitoringPipeline["Monitoring Pipeline"]:::pipeline
RunData -->|"runs"| DataPipeline
RunModel -->|"runs"| ModelPipeline
RunMonitor -->|"runs"| MonitoringPipeline
end
%% Serving Layer
subgraph "Serving Layer"
direction TB
APIApp["Model Serving API"]:::api
RunAPI -->|"launches"| APIApp
end
%% External Services
subgraph "External Services"
direction TB
DVC[(DVC Remote)]:::external
MLflow[(MLflow Server)]:::external
Clients{"Clients"}:::external
DataPipeline -->|"data artifacts"| DVC
ModelPipeline -->|"model artifacts"| DVC
ModelPipeline -->|"experiment logs"| MLflow
MonitoringPipeline -->|"drift metrics"| MLflow
APIApp -->|"loads model"| MLflow
APIApp -->|"fetches data/models"| DVC
Clients -->|"HTTP requests"| APIApp
end
%% CI/CD
subgraph "CI/CD"
direction TB
GHActions["GitHub Actions"]:::ci
Tests["Tests (pipelines & API)"]:::ci
GHActions -->|"runs"| Tests
GHActions -->|"deploys"| RunData
GHActions -->|"deploys"| RunModel
GHActions -->|"deploys"| APIApp
end
%% Click Events
click CLI "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/cli.py"
click MainEntrypoint "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/main.py"
click RunAPI "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_api.py"
click RunModel "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_model_pipeline.py"
click RunData "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_model_pipeline.py"
click RunMonitor "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_monitoring.py"
click DataPipeline "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/data_pipeline.py"
click ModelPipeline "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/model_pipeline.py"
click MonitoringPipeline "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/monitoring/data_drift.py"
click APIApp "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/api/app.py"
click MLflowUtils "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/mlflow_utils.py"
click MLflowUtils "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/mlflow_utils.py"
click APIEndpoints "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/api/model_api.py"
click APIEndpoints "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/api/model_api.py"
click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/data_config.json"
click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/model_config.json"
click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/monitor_config.json"
click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/api_config.json"
click GHActions "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/.github/workflows/ci_cd.yml"
click Tests "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/tests/test_pipelines.py"
click Requirements "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/requirements.txt"
click Setup "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/setup.py"
%% Styles
classDef pipeline fill:#E0F7FA,stroke:#0288D1,stroke-width:1px
classDef api fill:#E8F5E9,stroke:#388E3C,stroke-width:1px
classDef external fill:#FFF3E0,stroke:#F57C00,stroke-width:1px
classDef ci fill:#FCE4EC,stroke:#C2185B,stroke-width:1px
classDef cli fill:#E0E0E0,stroke:#757575,stroke-width:1px
- Data processing and feature engineering pipeline
- Model training and evaluation pipeline
- Model performance monitoring and drift detection
- Model serving via REST API
- DVC integration for data and model versioning
- MLflow integration for experiment tracking
- Automated CI/CD pipeline
.
├── api/ # Model serving API
├── examples/ # Usage examples
├── monitoring/ # Model monitoring components
├── notebooks/ # Jupyter notebooks for exploration
├── pipelines/ # Core pipeline components
├── tests/ # Unit and integration tests
├── .dvc/ # DVC configuration
├── .github/ # GitHub Actions workflows
└── cli.py # Command-line interface
- Clone the repository:
git clone https://github.com/Pewpenguin/mlops-continuous-training-pipeline
cd mlopps
- Install dependencies:
pip install -r requirements.txt
- Configure DVC:
dvc remote modify s3remote url s3://your-bucket-name
dvc remote modify s3remote endpointurl https://your-endpoint.com
python cli.py process-data configs/data_config.json
python cli.py train-model configs/model_config.json
python cli.py monitor-model configs/monitor_config.json
python cli.py serve-model configs/api_config.json
All pipeline components are configured using JSON configuration files. Example configurations can be found in the examples
directory.
{
"data_source": "data/raw/dataset.csv",
"features": ["feature1", "feature2"],
"target_column": "target",
"test_size": 0.2
}
{
"trainer": {
"model_type": "random_forest",
"task_type": "classification",
"model_params": {
"n_estimators": 100,
"max_depth": 10
}
},
"evaluator": {
"metrics": ["accuracy", "f1", "precision", "recall"]
},
"registry": {
"experiment_name": "model_experiment",
"model_name": "production_model"
}
}
pyttest tests/
- Create a new branch
- Implement the feature
- Add tests
- Submit a pull request
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.