This guide explains how to set up monitoring for masque-vpn using Prometheus and Grafana.
- Prometheus server
- Grafana server
- masque-vpn server with metrics enabled
The monitoring setup consists of:
- masque-vpn server - Exposes metrics on
/metricsendpoint - Prometheus - Scrapes and stores metrics
- Grafana - Visualizes metrics with dashboards
Metrics are enabled by default in the server configuration:
[metrics]
enabled = true
listen_addr = "127.0.0.1:9090"Note: In the current implementation, metrics are served on the same port as the API server (8080) at the /metrics endpoint.
Add the following to your prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'masque-vpn'
static_configs:
- targets: ['localhost:8080'] # API server port
metrics_path: '/metrics'
scrape_interval: 10s
scrape_timeout: 5s
# If running multiple servers
- job_name: 'masque-vpn-cluster'
static_configs:
- targets:
- 'server1:8080'
- 'server2:8080'
- 'server3:8080'
metrics_path: '/metrics'For easy deployment, use the provided docker-compose.yml:
version: '3.8'
services:
masque-vpn-server:
build: ./vpn_server
ports:
- "4433:4433" # MASQUE VPN
- "8080:8080" # API + Metrics
volumes:
- ./cert:/app/cert:ro
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- ./monitoring/grafana/dashboards:/var/lib/grafana/dashboards:ro- Open Grafana at
http://localhost:3000 - Login with admin/admin
- Go to "+" → Import
- Upload
grafana/masque-vpn-dashboard.json
The dashboard includes:
- Connection Overview: Active connections, total connections
- Traffic Analysis: Bytes/packets sent and received
- Performance Metrics: Packet processing latency, MASQUE request duration
- Resource Utilization: IP pool usage, TUN interface status
- Error Monitoring: Error rates by type, packet drops
- MASQUE Protocol: Request success rates, QUIC stream statistics
Example queries for custom panels:
Active Connections:
masque_vpn_active_connections
Packet Processing Latency (95th percentile):
histogram_quantile(0.95, rate(masque_vpn_packet_processing_duration_seconds_bucket[5m]))
IP Pool Utilization:
(masque_vpn_ip_pool_used / masque_vpn_ip_pool_total) * 100
Error Rate:
rate(masque_vpn_errors_total[5m])
The metrics endpoint is available at:
http://localhost:8080/metrics
Verify metrics are working:
# Check if metrics endpoint responds
curl http://localhost:8080/metrics
# Check specific metric
curl http://localhost:8080/metrics | grep masque_vpn_active_connections# HELP masque_vpn_active_connections Current number of active connections
# TYPE masque_vpn_active_connections gauge
masque_vpn_active_connections 3
# HELP masque_vpn_total_connections Total connections since startup
# TYPE masque_vpn_total_connections counter
masque_vpn_total_connections 15
# HELP masque_vpn_ip_pool_total Total IP addresses in pool
# TYPE masque_vpn_ip_pool_total gauge
masque_vpn_ip_pool_total 254
Create alerts.yml:
groups:
- name: masque-vpn.rules
rules:
- alert: MASQUEVPNHighErrorRate
expr: rate(masque_vpn_errors_total[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate in MASQUE VPN server"
description: "Error rate is {{ $value }} errors per second"
- alert: MASQUEVPNIPPoolExhaustion
expr: (masque_vpn_ip_pool_available / masque_vpn_ip_pool_total) < 0.1
for: 1m
labels:
severity: critical
annotations:
summary: "MASQUE VPN IP pool nearly exhausted"
description: "Only {{ $value }}% of IP addresses available"
- alert: MASQUEVPNHighLatency
expr: histogram_quantile(0.95, rate(masque_vpn_packet_processing_duration_seconds_bucket[5m])) > 0.1
for: 3m
labels:
severity: warning
annotations:
summary: "High packet processing latency"
description: "95th percentile latency is {{ $value }}s"Configure alerts in Grafana:
- Go to Alerting → Alert Rules
- Create new rule
- Set query and conditions
- Configure notification channels (Slack, email, etc.)
- Check if metrics are enabled in server config
- Verify server is running and accessible
- Check firewall rules for port 8080
- Review server logs for errors
- Verify target configuration in
prometheus.yml - Check Prometheus logs:
docker logs prometheus - Verify network connectivity:
curl http://server:8080/metrics
- Check data source configuration
- Verify Prometheus is collecting metrics
- Check query syntax in panels
- Review Grafana logs for errors
- Metrics collection has minimal performance impact
- Scrape interval of 10-15 seconds is recommended
- Avoid very frequent scraping (< 5 seconds)
- Prometheus storage grows with number of metrics and retention period
- Default retention is 15 days
- Consider using remote storage for long-term retention
For production deployments:
- Run multiple Prometheus instances
- Use Prometheus federation
- Configure Grafana with multiple data sources
- Set up alertmanager clustering
For research and educational purposes:
- Monitor protocol behavior under different conditions
- Analyze performance characteristics
- Study the impact of network conditions on VPN performance
- Compare metrics before and after configuration changes
See Metrics Reference for detailed metric descriptions.