Time-travel debugging for ML predictions
Immutable prediction logging and historical replay for traceability.
Quickstart | API | Docker | Progress
In production ML systems, models, code, and data evolve independently. When a prediction looks wrong, it is hard to answer: Which model did this? What input did it see? Can we reproduce the output today? ChronoML focuses on capturing those answers early so debugging and auditing are possible later.
| Component | Purpose |
|---|---|
| app/ | FastAPI layer exposing /predict, /events, /replay |
| model/ | Training script that produces versioned artifacts |
| artifacts/ | Saved model files and metadata (immutable outputs) |
| db/ | SQLite schema and retention cleanup for prediction events |
| tests/ | Pytest coverage for prediction, replay, validation |
- /predict validates input with Pydantic.
- The active model artifact is loaded at startup.
- The model runs inference and latency is measured.
- A prediction_events row is written to SQLite with: event_id, timestamp, model_version, git_commit, data_version, input_json, output_json, latency_ms.
/replay/{event_id} fetches the stored event, loads the model artifact referenced by that event, and re-runs inference using the stored input JSON. The response includes the original output, the replayed output, and a boolean match flag.
Why mismatches can occur:
- The wrong model version is loaded.
- Input schema changed or stored JSON is invalid.
- Non-determinism or different preprocessing in future versions.
- Retention removed the event, so replay is no longer possible.
Predict:
curl -X POST http://127.0.0.1:8000/predict \
-H "Content-Type: application/json" \
-d '{"sepal_length":5.1,"sepal_width":3.5,"petal_length":1.4,"petal_width":0.2}'Events:
curl "http://127.0.0.1:8000/events?limit=5&model_version=v1.0"Replay:
curl "http://127.0.0.1:8000/replay/<EVENT_ID>"Local:
pip install -r requirements.txt
uvicorn app.main:app --reloadHealth:
GET /health -> {"status":"ok"}
Console UI (demo only):
GET / -> basic UI for health, predict, events, replay
Build:
docker build -t chronoml .Run:
docker run --rm -p 8000:8000 -e MODEL_ACTIVE_VERSION=v1.0 chronomlThe SQLite database file is created at runtime inside the container if missing.
pytest- MODEL_ACTIVE_VERSION: choose model version at startup (default v1.0).
- MAX_REQUEST_BYTES: max request payload size (default 10000).
- RETENTION_DAYS: delete events older than N days (default 30).
- ML lifecycle management with versioned artifacts and metadata.
- Traceability by logging every prediction with code and model version.
- Reproducibility limits and replay behavior as systems evolve.
- Basic industrial practices: configuration via env vars, CI with GitHub Actions, and Dockerized deployment.
- Created the project skeleton with
app/,model/,artifacts/,db/, andtests/. - Added a minimal FastAPI entrypoint in
app/main.py. - Exposed a
/healthendpoint returning a JSON status payload. - Added
requirements.txtwith FastAPI + Uvicorn dependencies.
- Added a baseline training script in
model/train_baseline.py. - Uses scikit-learn on the Iris dataset to train a RandomForestClassifier model.
- Saves the trained model to
artifacts/model_v1.pkl. - Writes metadata to
artifacts/model_v1_meta.jsonwith version, timestamp, features, and accuracy.
- Added a SQLite initialization script in
db/init_db.py. - Creates the
prediction_eventstable with required fields for immutable logging. - Writes the database file to
db/chronoml.db.
- Added a
/predictendpoint that validates input and runs model inference. - Loads the model artifact once at startup for consistent performance.
- Logs each prediction to SQLite with event ID, timestamp, version, and latency.
- Added a
/eventsendpoint to inspect recent prediction events. - Supports limit and optional model version filtering.
- Returns lightweight previews for quick debugging and demo use.
- Added model versioning support using
MODEL_ACTIVE_VERSION. - Updated training script to produce versioned artifacts (v1/v2).
- Added docs for training and switching active model version.
- Added a
/replay/{event_id}endpoint to re-run historical predictions. - Loads the exact model artifact used at the time of the event.
- Returns original vs replayed output and a match flag.
- Added request size limits to prevent oversized payloads.
- Added a retention cleanup routine for old prediction events.
- Documented guardrails and their impact on replay.
- Added pytest coverage for /predict, /replay, and invalid request handling.
- Tests run against an isolated SQLite database.
- Validates replay uses the historical model artifact.
- Added a GitHub Actions workflow to run pytest on pushes and pull requests.
- Pinned the CI Python version to 3.10 for consistent runs.
- Added a Dockerfile that runs the FastAPI app on 0.0.0.0:8000.
- Added a .dockerignore to keep local env, caches, and DB files out of images.
- Added a minimal console UI at
/for quick manual demos.
- Rewrote the README to explain motivation, architecture, and learning goals.
- Documented prediction logging, replay flow, and mismatch causes.
- Added concise API examples plus CI, Docker, and configuration notes.