📦 Batch Labeling Pipeline

A Task Submission for CloudFactory

This project implements a production-oriented batch image labeling pipeline.
It is designed to take a set of input images, preprocess them, run inference using a hosted ML model, and output predictions in COCO format, with complete logging and teardown steps.

🚀 Task Objectives

Given:

A Machine Learning model
A set of images to label

We must build a pipeline that:

Deploys infrastructure to host the model for inference
Processes raw images with augmentation (image preprocessing)
Produces output predictions in COCO format
Tears down infrastructure after task completion

🏗️ Architecture

Diagram

 +------------------+        +--------------------+        +--------------------+
 |                  |        |                    |        |                    |
 |  Input Images    | -----> |  Preprocessing     | -----> |  Model Server      |
 |(data/sampleimages)        |  (pipeline/preprocess.py) |  |  (FastAPI + Model) |
 |                  |        |                    |        |                    |
 +------------------+        +--------------------+        +--------------------+
                                                                |
                                                                |
                                                                v
                                                     +--------------------+
                                                     |  COCO Export        |
                                                     |  (pipeline/coco_utils.py) |
                                                     +--------------------+
                                                                |
                                                                v
                                                     +--------------------+
                                                     |  Output            |
                                                     |  predictions.json  |
                                                     |  run.log           |
                                                     +--------------------+

Components

Input Images — Folder with .jpg, .jpeg, .png images.
Preprocessing — Image augmentation & preprocessing with OpenCV. Supports resizing, RGB conversion, and configurable transforms.
Model Server — FastAPI app serving the ML model for inference. Dockerized for portability.
COCO Export — Converts predictions into COCO-style JSON (images, annotations, categories).
Output — Stores predictions (predictions.json) and logs (run.log).

✨ Features

🔧 Dockerized model server for easy deployment
🖼️ OpenCV preprocessing with batch and augmentation support
📑 COCO format output ready for ML pipelines
📜 Detailed logging (output/run.log)
🧪 Unit + Integration tests (pytest)
⚙️ GitHub Actions CI for linting + tests
🔄 Extensible — swap in real ML models (.pt, .onnx, .h5)

⚡ Advantages of this Architecture

Modular — Each step (preprocessing, server, batch runner, export) is independent
Scalable — Handles large datasets with configurable --batch_size
Reproducible — Docker ensures consistent runtime
Standardized — COCO format ensures downstream compatibility
Extensible — Replace dummy model with PyTorch, TensorFlow, or ONNX models
Robust — Logging + tests ensure reliability in production workflows

🛠️ Requirements

Python: 3.10+
Docker & Docker Compose (for containerized execution)

Dependencies:

pip install -r requirements.txt
pip install -r model_server/requirements.txt

Run with Docker

Build & start server
```
docker-compose up --build -d
```

Run batch pipeline

python -m pipeline.run_batch \
    --input_dir data/sample_images \
    --output_file output/predictions.json \
    --batch_size 4 \
    --server_url http://localhost:8000/predict/

Check outputs

less output/predictions.json   # COCO results
less output/run.log            # Logs

Tear down
```
docker-compose down
```

Run Locally (Without Docker)

Start server
```
python -m model_server.main
```

Run pipeline (new terminal)

python -m pipeline.run_batch --input_dir data/sample_images --output_file output/predictions.json --batch_size 1 --server_url http://localhost:8000/predict/

🧪 Running Tests

pytest -q

Covers:

✅ Preprocessing
✅ COCO export
✅ Model server + pipeline integration

📂 Output Files

output/predictions.json — COCO predictions (images + annotations).
output/run.log — Detailed logs with timestamps.

992bd08 (task completed)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
data		data
loadedimages		loadedimages
model_server		model_server
output		output
pipeline		pipeline
tests		tests
tmp_proc		tmp_proc
.gitignore		.gitignore
README.md		README.md
class_mapping.json		class_mapping.json
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
transforms.json		transforms.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📦 Batch Labeling Pipeline

🚀 Task Objectives

🏗️ Architecture

Diagram

Components

✨ Features

⚡ Advantages of this Architecture

🛠️ Requirements

Run with Docker

Run Locally (Without Docker)

🧪 Running Tests

📂 Output Files

About

Uh oh!

Releases

Packages

Languages

udahal2/batchLabellingPipelineImages

Folders and files

Latest commit

History

Repository files navigation

📦 Batch Labeling Pipeline

🚀 Task Objectives

🏗️ Architecture

Diagram

Components

✨ Features

⚡ Advantages of this Architecture

🛠️ Requirements

Run with Docker

Run Locally (Without Docker)

🧪 Running Tests

📂 Output Files

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages