[dont merge]: make dashboard more robust#56
Conversation
wz1qqx
commented
Jan 25, 2026
There was a problem hiding this comment.
Pull request overview
This pull request enhances the metrics and monitoring infrastructure by adding comprehensive Grafana dashboards and expanding Prometheus scrape configurations.
Changes:
- Added vLLM and Router-Int scrape targets to Prometheus configuration with 8 vLLM instances and 1 Router-Int instance
- Created comprehensive vLLM monitoring dashboard with 36 panels covering token throughput, request latency, cache metrics, and performance indicators
- Created Router-Int monitoring dashboard with 31 panels tracking HTTP, router, worker, circuit breaker, and retry metrics
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| examples/metric-prometheus/prometheus.yml | Expanded scrape configuration to include vLLM (8 instances on ports 9000-9007) and Router-Int (port 22000) services |
| examples/metric-prometheus/grafana/dashboards/vllm.json | New comprehensive dashboard monitoring vLLM inference server with metrics for tokens, latency, cache, requests, and performance |
| examples/metric-prometheus/grafana/dashboards/router-int.json | New dashboard for Router-Int service covering HTTP layer, router layer, worker layer, circuit breaker, and retry metrics |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| evaluation_interval: 5s | ||
|
|
||
| scrape_configs: | ||
| # Pegaflow 服务 |
There was a problem hiding this comment.
The comment contains Chinese text "Pegaflow 服务" (Pegaflow service). For consistency and accessibility, configuration comments should be in English. Consider changing this to "# Pegaflow service" to match the pattern used in other parts of the codebase.
| # Pegaflow 服务 | |
| # Pegaflow service |
| static_configs: | ||
| - targets: ['host.docker.internal:9091'] | ||
|
|
||
| # vLLM 服务 (8个实例,端口 9000-9007) |
There was a problem hiding this comment.
The comment contains Chinese text "vLLM 服务 (8个实例,端口 9000-9007)" (vLLM service, 8 instances, ports 9000-9007). For consistency and accessibility, configuration comments should be in English. Consider changing this to "# vLLM service (8 instances, ports 9000-9007)".
| - 'host.docker.internal:9006' | ||
| - 'host.docker.internal:9007' | ||
|
|
||
| # Router-int 服务 |
There was a problem hiding this comment.
The comment contains Chinese text "Router-int 服务" (Router-int service). For consistency and accessibility, configuration comments should be in English. Consider changing this to "# Router-int service".
| # Router-int 服务 | |
| # Router-int service |
| } | ||
| ], | ||
| "preload": false, | ||
| "refresh": "15m", |
There was a problem hiding this comment.
The refresh interval is set to "15m" which seems unusually long for a monitoring dashboard. This means the dashboard will only refresh every 15 minutes, which may not provide timely insights for performance monitoring. Consider using a shorter interval like "5s", "10s", or "30s" to match the monitoring nature of the dashboard and be consistent with the vLLM dashboard which uses "5s".
| "refresh": "15m", | |
| "refresh": "5s", |
| "type": "prometheus", | ||
| "uid": "Prometheus" | ||
| }, | ||
| "description": "분당 성공 요청 수", |
There was a problem hiding this comment.
The description contains Korean text "분당 성공 요청 수" (Success requests per minute). For consistency and accessibility, dashboard descriptions should be in English. Consider changing this to "Success requests per minute" to match the panel title.
| "description": "분당 성공 요청 수", | |
| "description": "Success requests per minute", |
| "type": "prometheus", | ||
| "uid": "Prometheus" | ||
| }, | ||
| "description": "분당 전체 요청 수 (성공+실패 포함)", |
There was a problem hiding this comment.
The description contains Korean text "분당 전체 요청 수 (성공+실패 포함)" (Total requests per minute including success and failures). For consistency and accessibility, dashboard descriptions should be in English. Consider changing this to "Total requests per minute (including success and failures)" to match the panel title.
| "description": "분당 전체 요청 수 (성공+실패 포함)", | |
| "description": "Total requests per minute (including success and failures)", |
| "type": "prometheus", | ||
| "uid": "Prometheus" | ||
| }, | ||
| "description": "누적 성공 요청 수", |
There was a problem hiding this comment.
The description contains Korean text "누적 성공 요청 수" (Cumulative success requests). For consistency and accessibility, dashboard descriptions should be in English. Consider changing this to "Cumulative success requests" to match the panel title.
| "description": "누적 성공 요청 수", | |
| "description": "Cumulative Success Requests", |