MLOps Pipeline

A comprehensive MLOps pipeline for managing the complete lifecycle of machine learning models, including data processing, model training, monitoring, and deployment.

flowchart TD
    %% Orchestration Layer
    subgraph "Orchestration Layer"
        direction TB
        MainEntrypoint["Main Entrypoint"]:::cli
        CLI["CLI Orchestrator"]:::cli
        MainEntrypoint -->|"invokes"| CLI

        RunData["Data Pipeline Launcher"]:::cli
        RunModel["Model Pipeline Launcher"]:::cli
        RunMonitor["Monitoring Pipeline Launcher"]:::cli
        RunAPI["API Server Launcher"]:::cli

        CLI -->|"process-data config.json"| RunData
        CLI -->|"train-model config.json"| RunModel
        CLI -->|"monitor-model config.json"| RunMonitor
        CLI -->|"serve-model config.json"| RunAPI
    end

    %% Pipelines
    subgraph "Pipelines" 
        direction TB
        DataPipeline["Data Pipeline"]:::pipeline
        ModelPipeline["Model Pipeline"]:::pipeline
        MonitoringPipeline["Monitoring Pipeline"]:::pipeline

        RunData -->|"runs"| DataPipeline
        RunModel -->|"runs"| ModelPipeline
        RunMonitor -->|"runs"| MonitoringPipeline
    end

    %% Serving Layer
    subgraph "Serving Layer"
        direction TB
        APIApp["Model Serving API"]:::api
        RunAPI -->|"launches"| APIApp
    end

    %% External Services
    subgraph "External Services"
        direction TB
        DVC[(DVC Remote)]:::external
        MLflow[(MLflow Server)]:::external
        Clients{"Clients"}:::external

        DataPipeline -->|"data artifacts"| DVC
        ModelPipeline -->|"model artifacts"| DVC
        ModelPipeline -->|"experiment logs"| MLflow
        MonitoringPipeline -->|"drift metrics"| MLflow
        APIApp -->|"loads model"| MLflow
        APIApp -->|"fetches data/models"| DVC
        Clients -->|"HTTP requests"| APIApp
    end

    %% CI/CD
    subgraph "CI/CD"
        direction TB
        GHActions["GitHub Actions"]:::ci
        Tests["Tests (pipelines & API)"]:::ci

        GHActions -->|"runs"| Tests
        GHActions -->|"deploys"| RunData
        GHActions -->|"deploys"| RunModel
        GHActions -->|"deploys"| APIApp
    end

    %% Click Events
    click CLI "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/cli.py"
    click MainEntrypoint "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/main.py"
    click RunAPI "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_api.py"
    click RunModel "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_model_pipeline.py"
    click RunData "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_model_pipeline.py"
    click RunMonitor "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_monitoring.py"
    click DataPipeline "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/data_pipeline.py"
    click ModelPipeline "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/model_pipeline.py"
    click MonitoringPipeline "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/monitoring/data_drift.py"
    click APIApp "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/api/app.py"
    click MLflowUtils "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/mlflow_utils.py"
    click MLflowUtils "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/mlflow_utils.py"
    click APIEndpoints "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/api/model_api.py"
    click APIEndpoints "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/api/model_api.py"
    click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/data_config.json"
    click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/model_config.json"
    click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/monitor_config.json"
    click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/api_config.json"
    click GHActions "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/.github/workflows/ci_cd.yml"
    click Tests "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/tests/test_pipelines.py"
    click Requirements "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/requirements.txt"
    click Setup "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/setup.py"

    %% Styles
    classDef pipeline fill:#E0F7FA,stroke:#0288D1,stroke-width:1px
    classDef api fill:#E8F5E9,stroke:#388E3C,stroke-width:1px
    classDef external fill:#FFF3E0,stroke:#F57C00,stroke-width:1px
    classDef ci fill:#FCE4EC,stroke:#C2185B,stroke-width:1px
    classDef cli fill:#E0E0E0,stroke:#757575,stroke-width:1px

Features

Data processing and feature engineering pipeline
Model training and evaluation pipeline
Model performance monitoring and drift detection
Model serving via REST API
DVC integration for data and model versioning
MLflow integration for experiment tracking
Automated CI/CD pipeline

Project Structure

.
├── api/                 # Model serving API
├── examples/            # Usage examples
├── monitoring/          # Model monitoring components
├── notebooks/          # Jupyter notebooks for exploration
├── pipelines/          # Core pipeline components
├── tests/              # Unit and integration tests
├── .dvc/               # DVC configuration
├── .github/            # GitHub Actions workflows
└── cli.py              # Command-line interface

Setup

Clone the repository:

git clone https://github.com/Pewpenguin/mlops-continuous-training-pipeline
cd mlopps

Install dependencies:

pip install -r requirements.txt

Configure DVC:

dvc remote modify s3remote url s3://your-bucket-name
dvc remote modify s3remote endpointurl https://your-endpoint.com

Usage

Data Pipeline

python cli.py process-data configs/data_config.json

Model Training

python cli.py train-model configs/model_config.json

Model Monitoring

python cli.py monitor-model configs/monitor_config.json

Model Serving

python cli.py serve-model configs/api_config.json

Configuration

All pipeline components are configured using JSON configuration files. Example configurations can be found in the examples directory.

Data Pipeline Configuration

{
    "data_source": "data/raw/dataset.csv",
    "features": ["feature1", "feature2"],
    "target_column": "target",
    "test_size": 0.2
}

Model Pipeline Configuration

{
    "trainer": {
        "model_type": "random_forest",
        "task_type": "classification",
        "model_params": {
            "n_estimators": 100,
            "max_depth": 10
        }
    },
    "evaluator": {
        "metrics": ["accuracy", "f1", "precision", "recall"]
    },
    "registry": {
        "experiment_name": "model_experiment",
        "model_name": "production_model"
    }
}

Development

Running Tests

pyttest tests/

Adding New Features

Create a new branch
Implement the feature
Add tests
Submit a pull request

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MLOps Pipeline

Features

Project Structure

Setup

Usage

Data Pipeline

Model Training

Model Monitoring

Model Serving

Configuration

Data Pipeline Configuration

Model Pipeline Configuration

Development

Running Tests

Adding New Features

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
api		api
examples		examples
monitoring		monitoring
pipelines		pipelines
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
main.py		main.py
requirements.txt		requirements.txt
run_api.py		run_api.py
run_model_pipeline.py		run_model_pipeline.py
run_monitoring.py		run_monitoring.py
setup.py		setup.py

License

Pewpenguin/MLLoop

Folders and files

Latest commit

History

Repository files navigation

MLOps Pipeline

Features

Project Structure

Setup

Usage

Data Pipeline

Model Training

Model Monitoring

Model Serving

Configuration

Data Pipeline Configuration

Model Pipeline Configuration

Development

Running Tests

Adding New Features

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages