Skip to content

MLLoop is a modular continuous training pipeline for automating model retraining, evaluation, and deployment

License

Notifications You must be signed in to change notification settings

Pewpenguin/MLLoop

Repository files navigation

MLOps Pipeline

A comprehensive MLOps pipeline for managing the complete lifecycle of machine learning models, including data processing, model training, monitoring, and deployment.

flowchart TD
    %% Orchestration Layer
    subgraph "Orchestration Layer"
        direction TB
        MainEntrypoint["Main Entrypoint"]:::cli
        CLI["CLI Orchestrator"]:::cli
        MainEntrypoint -->|"invokes"| CLI

        RunData["Data Pipeline Launcher"]:::cli
        RunModel["Model Pipeline Launcher"]:::cli
        RunMonitor["Monitoring Pipeline Launcher"]:::cli
        RunAPI["API Server Launcher"]:::cli

        CLI -->|"process-data config.json"| RunData
        CLI -->|"train-model config.json"| RunModel
        CLI -->|"monitor-model config.json"| RunMonitor
        CLI -->|"serve-model config.json"| RunAPI
    end

    %% Pipelines
    subgraph "Pipelines" 
        direction TB
        DataPipeline["Data Pipeline"]:::pipeline
        ModelPipeline["Model Pipeline"]:::pipeline
        MonitoringPipeline["Monitoring Pipeline"]:::pipeline

        RunData -->|"runs"| DataPipeline
        RunModel -->|"runs"| ModelPipeline
        RunMonitor -->|"runs"| MonitoringPipeline
    end

    %% Serving Layer
    subgraph "Serving Layer"
        direction TB
        APIApp["Model Serving API"]:::api
        RunAPI -->|"launches"| APIApp
    end

    %% External Services
    subgraph "External Services"
        direction TB
        DVC[(DVC Remote)]:::external
        MLflow[(MLflow Server)]:::external
        Clients{"Clients"}:::external

        DataPipeline -->|"data artifacts"| DVC
        ModelPipeline -->|"model artifacts"| DVC
        ModelPipeline -->|"experiment logs"| MLflow
        MonitoringPipeline -->|"drift metrics"| MLflow
        APIApp -->|"loads model"| MLflow
        APIApp -->|"fetches data/models"| DVC
        Clients -->|"HTTP requests"| APIApp
    end

    %% CI/CD
    subgraph "CI/CD"
        direction TB
        GHActions["GitHub Actions"]:::ci
        Tests["Tests (pipelines & API)"]:::ci

        GHActions -->|"runs"| Tests
        GHActions -->|"deploys"| RunData
        GHActions -->|"deploys"| RunModel
        GHActions -->|"deploys"| APIApp
    end

    %% Click Events
    click CLI "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/cli.py"
    click MainEntrypoint "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/main.py"
    click RunAPI "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_api.py"
    click RunModel "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_model_pipeline.py"
    click RunData "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_model_pipeline.py"
    click RunMonitor "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/run_monitoring.py"
    click DataPipeline "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/data_pipeline.py"
    click ModelPipeline "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/model_pipeline.py"
    click MonitoringPipeline "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/monitoring/data_drift.py"
    click APIApp "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/api/app.py"
    click MLflowUtils "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/mlflow_utils.py"
    click MLflowUtils "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/pipelines/mlflow_utils.py"
    click APIEndpoints "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/api/model_api.py"
    click APIEndpoints "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/api/model_api.py"
    click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/data_config.json"
    click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/model_config.json"
    click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/monitor_config.json"
    click ConfigExamples "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/examples/api_config.json"
    click GHActions "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/.github/workflows/ci_cd.yml"
    click Tests "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/tests/test_pipelines.py"
    click Requirements "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/requirements.txt"
    click Setup "https://github.com/pewpenguin/mlops-continuous-training-pipeline/blob/main/setup.py"

    %% Styles
    classDef pipeline fill:#E0F7FA,stroke:#0288D1,stroke-width:1px
    classDef api fill:#E8F5E9,stroke:#388E3C,stroke-width:1px
    classDef external fill:#FFF3E0,stroke:#F57C00,stroke-width:1px
    classDef ci fill:#FCE4EC,stroke:#C2185B,stroke-width:1px
    classDef cli fill:#E0E0E0,stroke:#757575,stroke-width:1px
Loading

Features

  • Data processing and feature engineering pipeline
  • Model training and evaluation pipeline
  • Model performance monitoring and drift detection
  • Model serving via REST API
  • DVC integration for data and model versioning
  • MLflow integration for experiment tracking
  • Automated CI/CD pipeline

Project Structure

.
├── api/                 # Model serving API
├── examples/            # Usage examples
├── monitoring/          # Model monitoring components
├── notebooks/          # Jupyter notebooks for exploration
├── pipelines/          # Core pipeline components
├── tests/              # Unit and integration tests
├── .dvc/               # DVC configuration
├── .github/            # GitHub Actions workflows
└── cli.py              # Command-line interface

Setup

  1. Clone the repository:
git clone https://github.com/Pewpenguin/mlops-continuous-training-pipeline
cd mlopps
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure DVC:
dvc remote modify s3remote url s3://your-bucket-name
dvc remote modify s3remote endpointurl https://your-endpoint.com

Usage

Data Pipeline

python cli.py process-data configs/data_config.json

Model Training

python cli.py train-model configs/model_config.json

Model Monitoring

python cli.py monitor-model configs/monitor_config.json

Model Serving

python cli.py serve-model configs/api_config.json

Configuration

All pipeline components are configured using JSON configuration files. Example configurations can be found in the examples directory.

Data Pipeline Configuration

{
    "data_source": "data/raw/dataset.csv",
    "features": ["feature1", "feature2"],
    "target_column": "target",
    "test_size": 0.2
}

Model Pipeline Configuration

{
    "trainer": {
        "model_type": "random_forest",
        "task_type": "classification",
        "model_params": {
            "n_estimators": 100,
            "max_depth": 10
        }
    },
    "evaluator": {
        "metrics": ["accuracy", "f1", "precision", "recall"]
    },
    "registry": {
        "experiment_name": "model_experiment",
        "model_name": "production_model"
    }
}

Development

Running Tests

pyttest tests/

Adding New Features

  1. Create a new branch
  2. Implement the feature
  3. Add tests
  4. Submit a pull request

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

MLLoop is a modular continuous training pipeline for automating model retraining, evaluation, and deployment

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages