A comprehensive, Docker-based monitoring solution for modern infrastructure and applications. This stack provides full observability with metrics, logs, and alerts using industry-standard open-source tools.
This project delivers a complete monitoring infrastructure as code, allowing you to quickly deploy a production-ready monitoring solution. The stack includes:
- Metrics collection: Prometheus, Node Exporter, cAdvisor, Telegraf
- Log aggregation: Loki, Promtail
- Alerting: Alertmanager with email and Slack integration
- Visualization: Grafana with pre-configured dashboards
- Endpoint monitoring: Blackbox Exporter for HTTP/HTTPS/TCP checks
This monitoring solution is designed to provide immediate visibility into your infrastructure while remaining highly customizable to meet specific requirements.
- Zero-configuration deployment - Works out of the box with sensible defaults
- Environment-based configuration - Easily customize via
.env
file - Template-based configuration files - All configuration files use templates for easy customization
- Comprehensive metrics collection - From system metrics to container stats
- Centralized logging - Aggregate and search logs from all systems
- Multi-channel alerting - Email, Slack, and more
- Pre-built dashboards - Hit the ground running with ready-to-use dashboards
- Secure by default - Authentication enabled for all components
- Docker-compose deployment - Simple to deploy and manage
- Development-friendly - Includes MailHog for testing email alerts locally
- Docker Engine (19.03.0+)
- Docker Compose (1.27.0+)
- 2GB+ RAM recommended
- 10GB+ disk space
-
Clone the repository
git clone https://github.com/amirk1998/monitoring-stack.git cd devops-monitoring-stack
-
Configure your environment
cp .env.example .env # Edit .env file with your preferred settings
-
Generate configuration files
./setup-config.sh
-
Launch the stack
docker-compose up -d
-
Access the dashboards
- Grafana: http://localhost:3000 (default credentials: admin/ChangeMe123!)
- Prometheus: http://localhost:9090
- Alertmanager: http://localhost:9093
- MailHog (development only): http://localhost:8025
Component | Description | Port |
---|---|---|
Prometheus | Time-series database and metrics collector | 9090 |
Grafana | Visualization and dashboarding platform | 3000 |
Alertmanager | Alert handling and routing | 9093 |
Component | Description | Port |
---|---|---|
Node Exporter | Host system metrics (CPU, memory, disk, network) | 9100 |
cAdvisor | Container metrics and resource usage | 8080 |
Blackbox Exporter | Probes endpoints over HTTP, HTTPS, DNS, TCP | 9115 |
Telegraf | Pluggable metrics collection agent | 9273 |
Component | Description | Port |
---|---|---|
Loki | Log aggregation system | 3100 |
Promtail | Log collector and forwarder | - |
Component | Description | Port |
---|---|---|
MailHog | SMTP testing server with web interface | 1025, 8025 |
.
├── alertmanager/ # Alertmanager configuration
├── blackbox_exporter/ # Blackbox Exporter configuration
├── grafana/ # Grafana dashboards and datasources
├── loki/ # Loki configuration
├── prometheus/ # Prometheus configuration and rules
│ ├── alerts/ # Alert rules
│ └── ...
├── promtail/ # Promtail configuration
├── telegraf/ # Telegraf configuration
├── docker-compose.yml # Service definitions
├── .env.example # Example environment variables
├── setup-config.sh # Configuration generator script
└── README.md # This file
The .env
file controls key aspects of the monitoring stack:
- Service ports
- Credentials
- Alerting channels
- Retention settings
- Resource limits
See .env.example
for all available options.
All configuration files use templates (.yml.template
, .conf.template
) that are processed during setup:
- Values from the
.env
file are substituted - Final configuration files are generated
- Changes to templates require running
setup-config.sh
again
The stack comes with several pre-configured dashboards:
Dashboard | Description |
---|---|
Node Exporter Overview | Host-level metrics (CPU, memory, disk, network) |
Docker Containers | Container metrics from cAdvisor |
Prometheus Stats | Prometheus performance and health |
Alertmanager Overview | Alert status and history |
Loki Logs | Log exploration and search |
To add custom dashboards:
- Export dashboard JSON from Grafana
- Place in
grafana/provisioning/dashboards/
- Update
grafana/provisioning/dashboards/dashboard.yml
if needed - Restart Grafana:
docker-compose restart grafana
- Email: Configure via SMTP settings in
.env
- Slack: Configure via webhook URL in
.env
- Other integrations: Can be added in
alertmanager/alertmanager.yml.template
- Default rules are in
prometheus/alerts/custom_alerts.yml
- Add new rules by creating files in
prometheus/alerts/
- Rules are automatically picked up by Prometheus
Example alert rule:
groups:
- name: host
rules:
- alert: HighCpuLoad
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: High CPU load (instance {{ $labels.instance }})
description: CPU load is > 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}
- Add a new section to
prometheus/prometheus.yml.template
:
- job_name: 'new-service'
static_configs:
- targets: ['new-service:9090']
- Run
./setup-config.sh
to regenerate configurations - Restart Prometheus:
docker-compose restart prometheus
- Add the exporter to
docker-compose.yml
:
custom-exporter:
image: custom-exporter:latest
ports:
- '9999:9999'
networks:
- monitoring
- Add a scrape configuration to
prometheus/prometheus.yml.template
- Run
./setup-config.sh
- Restart the stack:
docker-compose up -d
- Grafana: Protected by username/password (configured in
.env
) - Basic auth can be enabled for other components by editing their respective config templates
- Default configuration exposes ports to host
- For production, consider:
- Using a reverse proxy with TLS
- Implementing network isolation
- Setting up firewall rules
- Change all default passwords
- Enable TLS for all connections
- Use Docker secrets or Kubernetes secrets for sensitive values
- Implement proper backup for data volumes
- Loki fails to start: Ensure schema and index type configuration match (see loki-config.yml)
- Prometheus can't scrape targets: Check network connectivity and firewall rules
- Grafana doesn't show data: Verify data source configuration and test connection
- Alerts not sending: Check SMTP or webhook configuration
View logs for any service:
docker-compose logs -f [service_name]
Example:
docker-compose logs -f prometheus
docker-compose logs -f loki
To update the stack to the latest images:
docker-compose pull
docker-compose up -d
Back up configuration and data:
# Configuration
tar -czvf config-backup.tar.gz */*.yml */*.conf
# Data volumes
docker run --rm -v prometheus_data:/data -v $(pwd):/backup alpine tar -czvf /backup/prometheus-data.tar.gz /data
docker run --rm -v grafana_data:/data -v $(pwd):/backup alpine tar -czvf /backup/grafana-data.tar.gz /data
docker run --rm -v loki_data:/data -v $(pwd):/backup alpine tar -czvf /backup/loki-data.tar.gz /data
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Run the tests (if any)
- Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.