Skip to content

[dont merge]: make dashboard more robust#56

Open
wz1qqx wants to merge 1 commit intonovitalabs:masterfrom
wz1qqx:make-more-dashboard
Open

[dont merge]: make dashboard more robust#56
wz1qqx wants to merge 1 commit intonovitalabs:masterfrom
wz1qqx:make-more-dashboard

Conversation

@wz1qqx
Copy link
Contributor

@wz1qqx wz1qqx commented Jan 25, 2026

image

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request enhances the metrics and monitoring infrastructure by adding comprehensive Grafana dashboards and expanding Prometheus scrape configurations.

Changes:

  • Added vLLM and Router-Int scrape targets to Prometheus configuration with 8 vLLM instances and 1 Router-Int instance
  • Created comprehensive vLLM monitoring dashboard with 36 panels covering token throughput, request latency, cache metrics, and performance indicators
  • Created Router-Int monitoring dashboard with 31 panels tracking HTTP, router, worker, circuit breaker, and retry metrics

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.

File Description
examples/metric-prometheus/prometheus.yml Expanded scrape configuration to include vLLM (8 instances on ports 9000-9007) and Router-Int (port 22000) services
examples/metric-prometheus/grafana/dashboards/vllm.json New comprehensive dashboard monitoring vLLM inference server with metrics for tokens, latency, cache, requests, and performance
examples/metric-prometheus/grafana/dashboards/router-int.json New dashboard for Router-Int service covering HTTP layer, router layer, worker layer, circuit breaker, and retry metrics

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

evaluation_interval: 5s

scrape_configs:
# Pegaflow 服务
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains Chinese text "Pegaflow 服务" (Pegaflow service). For consistency and accessibility, configuration comments should be in English. Consider changing this to "# Pegaflow service" to match the pattern used in other parts of the codebase.

Suggested change
# Pegaflow 服务
# Pegaflow service

Copilot uses AI. Check for mistakes.
static_configs:
- targets: ['host.docker.internal:9091']

# vLLM 服务 (8个实例,端口 9000-9007)
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains Chinese text "vLLM 服务 (8个实例,端口 9000-9007)" (vLLM service, 8 instances, ports 9000-9007). For consistency and accessibility, configuration comments should be in English. Consider changing this to "# vLLM service (8 instances, ports 9000-9007)".

Copilot uses AI. Check for mistakes.
- 'host.docker.internal:9006'
- 'host.docker.internal:9007'

# Router-int 服务
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains Chinese text "Router-int 服务" (Router-int service). For consistency and accessibility, configuration comments should be in English. Consider changing this to "# Router-int service".

Suggested change
# Router-int 服务
# Router-int service

Copilot uses AI. Check for mistakes.
}
],
"preload": false,
"refresh": "15m",
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refresh interval is set to "15m" which seems unusually long for a monitoring dashboard. This means the dashboard will only refresh every 15 minutes, which may not provide timely insights for performance monitoring. Consider using a shorter interval like "5s", "10s", or "30s" to match the monitoring nature of the dashboard and be consistent with the vLLM dashboard which uses "5s".

Suggested change
"refresh": "15m",
"refresh": "5s",

Copilot uses AI. Check for mistakes.
"type": "prometheus",
"uid": "Prometheus"
},
"description": "분당 성공 요청 수",
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description contains Korean text "분당 성공 요청 수" (Success requests per minute). For consistency and accessibility, dashboard descriptions should be in English. Consider changing this to "Success requests per minute" to match the panel title.

Suggested change
"description": "분당 성공 요청 수",
"description": "Success requests per minute",

Copilot uses AI. Check for mistakes.
"type": "prometheus",
"uid": "Prometheus"
},
"description": "분당 전체 요청 수 (성공+실패 포함)",
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description contains Korean text "분당 전체 요청 수 (성공+실패 포함)" (Total requests per minute including success and failures). For consistency and accessibility, dashboard descriptions should be in English. Consider changing this to "Total requests per minute (including success and failures)" to match the panel title.

Suggested change
"description": "분당 전체 요청 수 (성공+실패 포함)",
"description": "Total requests per minute (including success and failures)",

Copilot uses AI. Check for mistakes.
"type": "prometheus",
"uid": "Prometheus"
},
"description": "누적 성공 요청 수",
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description contains Korean text "누적 성공 요청 수" (Cumulative success requests). For consistency and accessibility, dashboard descriptions should be in English. Consider changing this to "Cumulative success requests" to match the panel title.

Suggested change
"description": "누적 성공 요청 수",
"description": "Cumulative Success Requests",

Copilot uses AI. Check for mistakes.
@jimmy-evo jimmy-evo changed the title metrics:make dashboard more robust feat(metrics): make dashboard more robust Jan 26, 2026
@wz1qqx wz1qqx changed the title feat(metrics): make dashboard more robust [dont merge]: make dashboard more robust Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants