This project sets up a hands-on Site Reliability Engineering (SRE) lab that demonstrates how to monitor applications and infrastructure using a modern open-source observability stack.
It includes preconfigured services for metrics, logs, and dashboards, designed to mirror production-grade reliability workflows.
- Prometheus β collects and stores time-series metrics from monitored services
- Grafana β visualizes metrics and builds alert dashboards
- Loki & Promtail β centralized log aggregation and querying
- Docker Compose β orchestrates multi-container setup locally
- Environment Variables (.env) β configurable ports, data paths, and credentials
- Modular design β easily extendable to Kubernetes, Alertmanager, or Slack alerting
ββββββββββββββ
β Promtail βββββΊ Logs ββββΊ Loki
ββββββββββββββ
β
βΌ
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β App (FlaskβββββΊβ Prometheus βββββΊβ Grafana β
ββββββββββββββ ββββββββββββββ ββββββββββββββ
git clone https://github.com/pmoise1981/sre-lab.git
cd sre-labCopy the example environment file:
cp .env.example .envdocker compose up -dGrafana will be available at: http://localhost:3000 Prometheus at: http://localhost:9090
- System Metrics Dashboard: CPU, memory, disk usage
- Container Health Dashboard: Uptime, restart count, latency
- Application Metrics (optional): Integrates with Flask or FastAPI exporters
Prometheus Β· Grafana Β· Loki Β· Promtail Β· Docker Compose Β· Linux Β· .env
- Add Alertmanager + Slack/Email alerts
- Add Service-Level Indicators (SLIs) and Service-Level Objectives (SLOs)
- Integrate OpenTelemetry exporters for tracing
- Add Kubernetes manifests for production-grade orchestration
Pierre Moise Site Reliability & DevOps Engineer | Observability, CI/CD, Cloud Automation π GitHub