MLOps-CI-Pipeline

An end-to-end MLOps pipeline demonstrating automated retraining, evaluation-based promotion, versioned deployment, and data drift monitoring, designed for a bachelor-level engineering project.

Requirements

This project requires Python 3.12.x. Other versions are not officially supported.

Setup

1. Create virtual environment (Python 3.12 required)

macOS / Linux

python3.12 -m venv .venv
source .venv/bin/activate

Windows

py -3.12 -m venv .venv
.\.venv\Scripts\Activate

2. Upgrade pip

python -m pip install --upgrade pip

3. Install project in editable mode

pip install -e .

How to Run

Run tests

python -m pytest tests/ -v --tb=short

Run pipeline

run-pipeline --config src/config/pipeline.yaml

First run: If a dataset is missing dataset.yaml, the pipeline will prompt you interactively to provide target column and task type. This only happens once — subsequent runs skip the prompt automatically.

Pipeline Stages

The pipeline executes the following stages in order:

Stage	Status	Description
`preprocessing`	Implemented	Selects feature and target columns from each split, writes to `preprocessed/`
`training`	Placeholder	Model training — not yet implemented
`evaluation`	Placeholder	Model evaluation — not yet implemented
`deployment`	Placeholder	Model deployment — not yet implemented

Data Flow

Tabular datasets:

data/raw/<dataset>/data.csv
        ↓  ingestion + versioning
data/processed/<dataset>/<version_id>/data.csv  +  train/  val/  test/
        ↓  preprocessing
data/processed/<dataset>/<version_id>/preprocessed/  train.csv  val.csv  test.csv

Image datasets:

data/raw/<dataset>/images/{class}/...
        ↓  versioning + stratified splitting
data/processed/<dataset>/<version_id>/train/images/{class}/...
        ↓  preprocessing (resize, normalize, flatten)
data/processed/<dataset>/<version_id>/preprocessed/  train.npz  val.npz  test.npz

Preprocessing reads column definitions (target, features) from the versioned dataset.yaml — no separate config file is needed.

Deployment

The prediction service runs as a Docker container that loads the current Production model from the MLflow registry at startup.

# 1. Make sure a Production model exists (run pipeline and approve)
run-pipeline --config src/config/pipeline.yaml

# 2. Copy environment template
cp .env.example .env

# 3. Build and start the container
docker compose -f docker/docker-compose.yml up --build

# 4. Verify
curl http://localhost:8000/health

See docs/deployment.md for full configuration, model governance details, and troubleshooting.

Image Classification

The pipeline supports image classification datasets using the ImageFolder convention. Place class-labeled images under data/raw/<name>/images/{class}/ with a dataset.yaml specifying task_type: image_classification.

# sklearn-based (flattened pixel vectors)
run-pipeline --config src/config/pipeline_image_classification.yaml

# CNN-based with PyTorch (spatial feature learning, recommended for drift analysis)
run-pipeline --config src/config/pipeline_image_cnn.yaml

See docs/image_datasets.md for folder structure, preprocessing configuration, augmentation, and limitations.

Adding Datasets

See data/raw/README.md for instructions on how to add new datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
.vscode		.vscode
ci		ci
data		data
docker		docker
docs		docs
experiments		experiments
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOps-CI-Pipeline

Requirements

Setup

1. Create virtual environment (Python 3.12 required)

2. Upgrade pip

3. Install project in editable mode

How to Run

Pipeline Stages

Data Flow

Deployment

Image Classification

Adding Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MLOps-CI-Pipeline

Requirements

Setup

1. Create virtual environment (Python 3.12 required)

2. Upgrade pip

3. Install project in editable mode

How to Run

Pipeline Stages

Data Flow

Deployment

Image Classification

Adding Datasets

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages