A Task Submission for CloudFactory
This project implements a production-oriented batch image labeling pipeline.
It is designed to take a set of input images, preprocess them, run inference using a hosted ML model, and output predictions in COCO format, with complete logging and teardown steps.
Given:
- A Machine Learning model
- A set of images to label
We must build a pipeline that:
- Deploys infrastructure to host the model for inference
- Processes raw images with augmentation (image preprocessing)
- Produces output predictions in COCO format
- Tears down infrastructure after task completion
+------------------+ +--------------------+ +--------------------+
| | | | | |
| Input Images | -----> | Preprocessing | -----> | Model Server |
|(data/sampleimages) | (pipeline/preprocess.py) | | (FastAPI + Model) |
| | | | | |
+------------------+ +--------------------+ +--------------------+
|
|
v
+--------------------+
| COCO Export |
| (pipeline/coco_utils.py) |
+--------------------+
|
v
+--------------------+
| Output |
| predictions.json |
| run.log |
+--------------------+
- Input Images — Folder with
.jpg
,.jpeg
,.png
images. - Preprocessing — Image augmentation & preprocessing with OpenCV. Supports resizing, RGB conversion, and configurable transforms.
- Model Server — FastAPI app serving the ML model for inference. Dockerized for portability.
- COCO Export — Converts predictions into COCO-style JSON (
images
,annotations
,categories
). - Output — Stores predictions (
predictions.json
) and logs (run.log
).
- 🔧 Dockerized model server for easy deployment
- 🖼️ OpenCV preprocessing with batch and augmentation support
- 📑 COCO format output ready for ML pipelines
- 📜 Detailed logging (
output/run.log
) - 🧪 Unit + Integration tests (pytest)
- ⚙️ GitHub Actions CI for linting + tests
- 🔄 Extensible — swap in real ML models (
.pt
,.onnx
,.h5
)
- Modular — Each step (preprocessing, server, batch runner, export) is independent
- Scalable — Handles large datasets with configurable
--batch_size
- Reproducible — Docker ensures consistent runtime
- Standardized — COCO format ensures downstream compatibility
- Extensible — Replace dummy model with PyTorch, TensorFlow, or ONNX models
- Robust — Logging + tests ensure reliability in production workflows
-
Python: 3.10+
-
Docker & Docker Compose (for containerized execution)
-
Dependencies:
pip install -r requirements.txt pip install -r model_server/requirements.txt
-
Build & start server
docker-compose up --build -d
-
Run batch pipeline
python -m pipeline.run_batch \ --input_dir data/sample_images \ --output_file output/predictions.json \ --batch_size 4 \ --server_url http://localhost:8000/predict/
-
Check outputs
less output/predictions.json # COCO results less output/run.log # Logs
-
Tear down
docker-compose down
-
Start server
python -m model_server.main
-
Run pipeline (new terminal)
python -m pipeline.run_batch --input_dir data/sample_images --output_file output/predictions.json --batch_size 1 --server_url http://localhost:8000/predict/
pytest -q
Covers:
- ✅ Preprocessing
- ✅ COCO export
- ✅ Model server + pipeline integration
output/predictions.json
— COCO predictions (images + annotations).output/run.log
— Detailed logs with timestamps.
992bd08 (task completed)