-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/add-flux-dashboards #619
Conversation
WalkthroughThis pull request adds two new Grafana dashboard configurations for monitoring Flux, specifically "Flux Control Plane" and "Flux Cluster Stats". In addition, the download script has been updated to include these new JSON files, and the dashboards list now includes entries for them. The changes cover annotations, panels for various metrics, and templating options for dynamic queries. The overall modifications integrate the new dashboards into the existing monitoring system without altering other workflows. Changes
Sequence Diagram(s)sequenceDiagram
participant S as Download Script
participant U1 as Flux-Control-Plane URL
participant U2 as Flux-Stats URL
participant FS as Filesystem
S->>U1: Request flux-control-plane.json
U1-->>S: Return dashboard JSON
S->>FS: Save flux-control-plane.json
S->>U2: Request flux-stats.json
U2-->>S: Return dashboard JSON
S->>FS: Save flux-stats.json
Suggested labels
Suggested reviewers
Poem
Tip 🌐 Web search-backed reviews and chat
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/system/monitoring-agents/templates/etcd-proxy-scrape.yaml (1)
119-121
: Typographical Error in Resource NameThe VMPodScrape resource is named
"etcd-managment-scrape"
. Consider correcting it to"etcd-management-scrape"
to eliminate the typographical error and maintain naming consistency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
dashboards/flux/flux-control-plane.json
(1 hunks)dashboards/flux/flux-stats.json
(1 hunks)hack/download-dashboards.sh
(1 hunks)packages/core/installer/values.yaml
(1 hunks)packages/core/platform/bundles/distro-full.yaml
(1 hunks)packages/core/platform/bundles/distro-hosted.yaml
(1 hunks)packages/core/platform/bundles/paas-full.yaml
(1 hunks)packages/core/platform/bundles/paas-hosted.yaml
(1 hunks)packages/extra/monitoring/dashboards.list
(1 hunks)packages/system/monitoring-agents/templates/etcd-proxy-scrape.yaml
(1 hunks)packages/system/monitoring-agents/templates/etcd-scrape.yaml
(1 hunks)packages/system/monitoring-agents/values.yaml
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- packages/core/installer/values.yaml
🧰 Additional context used
🪛 YAMLlint (1.35.1)
packages/system/monitoring-agents/templates/etcd-scrape.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
packages/system/monitoring-agents/templates/etcd-proxy-scrape.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
[warning] 68-68: wrong indentation: expected 0 but found 2
(indentation)
[warning] 123-123: wrong indentation: expected 2 but found 4
(indentation)
🔇 Additional comments (14)
packages/extra/monitoring/dashboards.list (1)
35-36
: LGTM! The new Flux dashboard entries are properly formatted.The new entries follow the existing pattern and are logically placed at the end of the list.
hack/download-dashboards.sh (1)
81-82
: LGTM! The new dashboard download entries are properly configured.The paths are correctly structured and placed in the appropriate section with other module dashboards.
dashboards/flux/flux-stats.json (1)
1-1391
: LGTM! The Flux Stats dashboard is well-structured and comprehensive.The dashboard configuration:
- Uses appropriate metrics for monitoring Flux cluster state
- Has well-organized panels with clear titles and descriptions
- Implements proper thresholds and visualizations
- Sets reasonable refresh intervals
dashboards/flux/flux-control-plane.json (1)
1-1725
: LGTM! The Flux Control Plane dashboard is well-designed and comprehensive.The dashboard configuration:
- Provides thorough monitoring of control plane components
- Uses appropriate metrics for resource usage and operations
- Has well-organized panels with clear titles and descriptions
- Sets appropriate refresh intervals for control plane monitoring
packages/system/monitoring-agents/templates/etcd-scrape.yaml (2)
1-1
: Conditional Template Inclusion and YAML Linting NoteThe configuration block is now wrapped in an
{{- if .Values.scrapeRules.etcd.enabled }}
clause to render the resource only when etcd scraping is enabled. This is appropriate for making the behavior configurable.
Note: Helm templating syntax can sometimes trigger YAML lint errors (e.g. “expected the node content, but found '-'”) even though it is valid in a Helm context. Verify that your linter is configured to handle Helm templates or that you suppress such false positives.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
30-30
: Protocol Update – Verify HTTP UsageThe
scheme
field is changed fromhttps
tohttp
. Please double-check that this protocol change aligns with your backend service requirements and that no encryption is needed for these internal metrics endpoints.packages/system/monitoring-agents/templates/etcd-proxy-scrape.yaml (3)
1-1
: Conditional Template for Disabled etcd ScrapeThe resource definitions in this file are now conditionally rendered when
.Values.scrapeRules.etcd.enabled
is false. This ensures that the kube‑rbac‑proxy deployment and associated RBAC resources are only created when etcd scraping is disabled, which helps avoid resource conflicts.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
67-70
: YAML Indentation Review in ClusterRoleBinding SubjectsStatic analysis flagged a potential indentation issue for the list element under
subjects
(lines 68–70). Please verify that the indentation conforms to Kubernetes YAML standards. If the current structure is correct in your templating context (or if the linter is misinterpreting the Helm template), you may need to adjust the linter settings or document this as a known false positive.🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 68-68: wrong indentation: expected 0 but found 2
(indentation)
122-125
: Verify Indentation for podMetricsEndpointsA static analysis warning indicates a potential indentation mismatch for the list under
podMetricsEndpoints
(specifically at line 123). Please double-check that this list is correctly nested within the resource definition as per the expected Kubernetes schema.🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 123-123: wrong indentation: expected 2 but found 4
(indentation)
packages/core/platform/bundles/distro-hosted.yaml (1)
61-65
: Enable etcd ScrapeRules for Monitoring AgentsA new
values
block has been added under themonitoring-agents
release, which enables etcd scraping viascrapeRules.etcd.enabled: true
. This change should integrate well with the broader monitoring configuration. Verify that this override is consistent with the default inpackages/system/monitoring-agents/values.yaml
and with similar settings in other bundled configurations.packages/core/platform/bundles/distro-full.yaml (1)
73-77
: Enable etcd ScrapeRules in Distro-Full ConfigurationThis diff introduces a new configuration block for the
monitoring-agents
release that setsscrapeRules.etcd.enabled: true
. The change is consistent with similar updates in other bundle files and enhances the monitoring setup by ensuring etcd metrics are scraped as intended.packages/core/platform/bundles/paas-hosted.yaml (1)
73-77
: Enable etcd ScrapeRules in Paas-Hosted ConfigurationThe new
values
block for themonitoring-agents
release now enables etcd scraping by settingscrapeRules.etcd.enabled: true
. This aligns with the changes in the distro-hosted and distro-full bundles. Please ensure that this configuration integrates seamlessly with your overall monitoring strategy and that defaults in related value files are set appropriately.packages/core/platform/bundles/paas-full.yaml (1)
100-104
: Etcd Scrape Rules Enabled by Default
The new values block under themonitoring-agents
release now enables etcd scrape rules (enabled: true
). This aligns with similar changes in other bundles (e.g., distro-full.yaml, paas-hosted.yaml). Please verify that enabling etcd scraping by default is the intended behavior for all deployment scenarios, and consider adding an in-line comment for future maintainers describing why this default was chosen.packages/system/monitoring-agents/values.yaml (1)
363-366
: Etcd Scrape Rules Default Set to False
A newscrapeRules
section has been added withetcd.enabled
set tofalse
. This default contrasts with the configuration in the paas-full bundle where etcd scraping is enabled. Please double-check that this discrepancy is intentional—for example, if this values file is meant for environments where etcd scraping should be disabled by default. Adding a clarifying comment here would help avoid confusion.
3838f2c
to
8e78010
Compare
8e78010
to
4d76763
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
dashboards/flux/flux-stats.json (1)
1371-1392
: Overall JSON Schema and IntegrationThe JSON adheres to schemaVersion 40 and includes all necessary sections for a Grafana dashboard. Before deployment, it would be worthwhile to validate this file with Grafana’s JSON model viewer or a linting tool to catch any subtle schema mismatches that might arise.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
dashboards/flux/flux-control-plane.json
(1 hunks)dashboards/flux/flux-stats.json
(1 hunks)hack/download-dashboards.sh
(1 hunks)packages/extra/monitoring/dashboards.list
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/extra/monitoring/dashboards.list
- hack/download-dashboards.sh
🔇 Additional comments (8)
dashboards/flux/flux-stats.json (4)
1-32
: Annotation Consistency CheckThe annotations block defines two annotation objects for alerts and Flux events. However, note that the first annotation uses a string value (
"-- Grafana --"
) for its datasource while the second uses an object with"type"
and"uid"
. For consistency (and easier maintenance), consider standardizing the datasource reference format across annotations.
33-37
: Dashboard Global SettingsThe basic dashboard settings (such as
"editable"
,"fiscalYearStartMonth"
,"graphTooltip"
,"id"
, and"links"
) are set as expected. No issues are detected here.
38-1299
: Panels Configuration OverviewThis dashboard includes a wide variety of panels (stat, bargauge, table, timeseries) that cover key Flux-related metrics such as:
- Reconciler Counts (e.g. Cluster Reconcilers, Failing Reconcilers),
- Resource Sources (e.g. Kubernetes Manifests Sources, Failing Sources),
- Operation Durations (e.g. Reconciler ops avg. duration, Source ops avg. duration), and
- Readiness Tables.
Please verify that each Prometheus query (which uses variables like
$namespace
) returns the intended data. Also, double-check that grid positions and panel dimensions are balanced for your typical Grafana display resolutions.
1301-1370
: Templating and Refresh ConfigurationThe global settings following the panels include the refresh rate (set to
"30s"
) and the templating list that defines dynamic variables (e.g."operator_namespace"
,"namespace"
, and"DS_PROMETHEUS"
). Confirm that the queries and any regular expressions in the templating (if used) yield the correct and expected values in your environment.dashboards/flux/flux-control-plane.json (4)
1-41
: Annotations Block ReviewThe annotations section defines two entries – one for "Annotations & Alerts" and one for "flux events" – both using a datasource object. This is consistent within this file. Please verify that the tag filtering (e.g. using
"tags": [ "flux" ]
) meets your operational needs.
42-47
: Dashboard Metadata ValidationThe initial metadata (including
"editable"
,"fiscalYearStartMonth"
,"graphTooltip"
,"id"
, and"links"
) is correctly provided. No concerns here.
48-1438
: Panels and Visualization SetupThis dashboard defines a rich set of panels for visualizing the Flux control plane:
- Controllers Panel with query
sum(go_info{pod=~".*-controller-.*"})
for tracking controller pods.- Max Work Queue, Memory, and API Requests Panels that display key operational metrics.
- Multiple timeseries and table panels capturing resource usage, CPU/Memory usage, reconciliation durations, and repository operations (for Git, OCI, Helm, Buckets, etc.).
Each panel’s Prometheus query and threshold configuration appear well thought out. Be sure to test that the variable substitutions (e.g.
${DS_PROMETHEUS}
and$namespace
) inject the correct values and that the queries return data as expected.
1439-1725
: Global Settings and TemplatingThe final section of the file includes the time settings, refresh intervals (set here to
"10s"
), and a templating list that defines variables such as"DS_PROMETHEUS"
and"namespace"
. The configuration seems complete and consistent with the overall monitoring system. Verify that the regular expression in the"namespace"
variable (if applicable) correctly captures the desired data from your metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary by CodeRabbit