Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/add-flux-dashboards #619

Merged
merged 10 commits into from
Feb 6, 2025
Merged

Conversation

klinch0
Copy link
Contributor

@klinch0 klinch0 commented Feb 6, 2025

Summary by CodeRabbit

  • New Features
    • Introduced a new dashboard for Flux Control Plane monitoring that visualizes key performance metrics like CPU, memory, API requests, and more.
    • Added a second dashboard for Flux Cluster Stats to display resource reconciliation, operation durations, and readiness indicators.
    • Seamlessly integrated these dashboards into the monitoring workflow with dynamic querying and periodic refresh options.

@klinch0 klinch0 requested a review from kvaps as a code owner February 6, 2025 10:20
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 6, 2025
Copy link
Contributor

coderabbitai bot commented Feb 6, 2025

Walkthrough

This pull request adds two new Grafana dashboard configurations for monitoring Flux, specifically "Flux Control Plane" and "Flux Cluster Stats". In addition, the download script has been updated to include these new JSON files, and the dashboards list now includes entries for them. The changes cover annotations, panels for various metrics, and templating options for dynamic queries. The overall modifications integrate the new dashboards into the existing monitoring system without altering other workflows.

Changes

File(s) Change Summary
dashboards/flux/flux-control-plane.json
dashboards/flux/flux-stats.json
Added new dashboard configurations for Flux, including panels, annotations, and templating options for diverse metrics related to control plane and cluster stats.
hack/download-dashboards.sh Updated to process the newly added dashboard JSON files for Flux Control Plane and Flux Cluster Stats within the existing download loop.
packages/extra/monitoring/dashboards.list Added new entries for flux/flux-control-plane and flux/flux-stats to the list of monitored dashboards.

Sequence Diagram(s)

sequenceDiagram
    participant S as Download Script
    participant U1 as Flux-Control-Plane URL
    participant U2 as Flux-Stats URL
    participant FS as Filesystem

    S->>U1: Request flux-control-plane.json
    U1-->>S: Return dashboard JSON
    S->>FS: Save flux-control-plane.json
    S->>U2: Request flux-stats.json
    U2-->>S: Return dashboard JSON
    S->>FS: Save flux-stats.json
Loading

Suggested labels

size:M, lgtm

Suggested reviewers

  • kvaps

Poem

Oh, I’m a bunny full of cheer,
New dashboards hop in, bright and clear.
Metrics and panels, data so sweet,
Hopping through code with nimble feet.
Celebrating flux in every byte—hip, hip, hooray!
🐇💻✨

Tip

🌐 Web search-backed reviews and chat
  • We have enabled web search-based reviews and chat for all users. This feature allows CodeRabbit to access the latest documentation and information on the web.
  • You can disable this feature by setting web_search: false in the knowledge_base settings.
  • Please share any feedback in the Discord discussion.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@dosubot dosubot bot added the enhancement New feature or request label Feb 6, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/system/monitoring-agents/templates/etcd-proxy-scrape.yaml (1)

119-121: Typographical Error in Resource Name

The VMPodScrape resource is named "etcd-managment-scrape". Consider correcting it to "etcd-management-scrape" to eliminate the typographical error and maintain naming consistency.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 861e6c4 and 3838f2c.

📒 Files selected for processing (12)
  • dashboards/flux/flux-control-plane.json (1 hunks)
  • dashboards/flux/flux-stats.json (1 hunks)
  • hack/download-dashboards.sh (1 hunks)
  • packages/core/installer/values.yaml (1 hunks)
  • packages/core/platform/bundles/distro-full.yaml (1 hunks)
  • packages/core/platform/bundles/distro-hosted.yaml (1 hunks)
  • packages/core/platform/bundles/paas-full.yaml (1 hunks)
  • packages/core/platform/bundles/paas-hosted.yaml (1 hunks)
  • packages/extra/monitoring/dashboards.list (1 hunks)
  • packages/system/monitoring-agents/templates/etcd-proxy-scrape.yaml (1 hunks)
  • packages/system/monitoring-agents/templates/etcd-scrape.yaml (1 hunks)
  • packages/system/monitoring-agents/values.yaml (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • packages/core/installer/values.yaml
🧰 Additional context used
🪛 YAMLlint (1.35.1)
packages/system/monitoring-agents/templates/etcd-scrape.yaml

[error] 1-1: syntax error: expected the node content, but found '-'

(syntax)

packages/system/monitoring-agents/templates/etcd-proxy-scrape.yaml

[error] 1-1: syntax error: expected the node content, but found '-'

(syntax)


[warning] 68-68: wrong indentation: expected 0 but found 2

(indentation)


[warning] 123-123: wrong indentation: expected 2 but found 4

(indentation)

🔇 Additional comments (14)
packages/extra/monitoring/dashboards.list (1)

35-36: LGTM! The new Flux dashboard entries are properly formatted.

The new entries follow the existing pattern and are logically placed at the end of the list.

hack/download-dashboards.sh (1)

81-82: LGTM! The new dashboard download entries are properly configured.

The paths are correctly structured and placed in the appropriate section with other module dashboards.

dashboards/flux/flux-stats.json (1)

1-1391: LGTM! The Flux Stats dashboard is well-structured and comprehensive.

The dashboard configuration:

  • Uses appropriate metrics for monitoring Flux cluster state
  • Has well-organized panels with clear titles and descriptions
  • Implements proper thresholds and visualizations
  • Sets reasonable refresh intervals
dashboards/flux/flux-control-plane.json (1)

1-1725: LGTM! The Flux Control Plane dashboard is well-designed and comprehensive.

The dashboard configuration:

  • Provides thorough monitoring of control plane components
  • Uses appropriate metrics for resource usage and operations
  • Has well-organized panels with clear titles and descriptions
  • Sets appropriate refresh intervals for control plane monitoring
packages/system/monitoring-agents/templates/etcd-scrape.yaml (2)

1-1: Conditional Template Inclusion and YAML Linting Note

The configuration block is now wrapped in an {{- if .Values.scrapeRules.etcd.enabled }} clause to render the resource only when etcd scraping is enabled. This is appropriate for making the behavior configurable.
Note: Helm templating syntax can sometimes trigger YAML lint errors (e.g. “expected the node content, but found '-'”) even though it is valid in a Helm context. Verify that your linter is configured to handle Helm templates or that you suppress such false positives.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 1-1: syntax error: expected the node content, but found '-'

(syntax)


30-30: Protocol Update – Verify HTTP Usage

The scheme field is changed from https to http. Please double-check that this protocol change aligns with your backend service requirements and that no encryption is needed for these internal metrics endpoints.

packages/system/monitoring-agents/templates/etcd-proxy-scrape.yaml (3)

1-1: Conditional Template for Disabled etcd Scrape

The resource definitions in this file are now conditionally rendered when .Values.scrapeRules.etcd.enabled is false. This ensures that the kube‑rbac‑proxy deployment and associated RBAC resources are only created when etcd scraping is disabled, which helps avoid resource conflicts.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 1-1: syntax error: expected the node content, but found '-'

(syntax)


67-70: YAML Indentation Review in ClusterRoleBinding Subjects

Static analysis flagged a potential indentation issue for the list element under subjects (lines 68–70). Please verify that the indentation conforms to Kubernetes YAML standards. If the current structure is correct in your templating context (or if the linter is misinterpreting the Helm template), you may need to adjust the linter settings or document this as a known false positive.

🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 68-68: wrong indentation: expected 0 but found 2

(indentation)


122-125: Verify Indentation for podMetricsEndpoints

A static analysis warning indicates a potential indentation mismatch for the list under podMetricsEndpoints (specifically at line 123). Please double-check that this list is correctly nested within the resource definition as per the expected Kubernetes schema.

🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 123-123: wrong indentation: expected 2 but found 4

(indentation)

packages/core/platform/bundles/distro-hosted.yaml (1)

61-65: Enable etcd ScrapeRules for Monitoring Agents

A new values block has been added under the monitoring-agents release, which enables etcd scraping via scrapeRules.etcd.enabled: true. This change should integrate well with the broader monitoring configuration. Verify that this override is consistent with the default in packages/system/monitoring-agents/values.yaml and with similar settings in other bundled configurations.

packages/core/platform/bundles/distro-full.yaml (1)

73-77: Enable etcd ScrapeRules in Distro-Full Configuration

This diff introduces a new configuration block for the monitoring-agents release that sets scrapeRules.etcd.enabled: true. The change is consistent with similar updates in other bundle files and enhances the monitoring setup by ensuring etcd metrics are scraped as intended.

packages/core/platform/bundles/paas-hosted.yaml (1)

73-77: Enable etcd ScrapeRules in Paas-Hosted Configuration

The new values block for the monitoring-agents release now enables etcd scraping by setting scrapeRules.etcd.enabled: true. This aligns with the changes in the distro-hosted and distro-full bundles. Please ensure that this configuration integrates seamlessly with your overall monitoring strategy and that defaults in related value files are set appropriately.

packages/core/platform/bundles/paas-full.yaml (1)

100-104: Etcd Scrape Rules Enabled by Default
The new values block under the monitoring-agents release now enables etcd scrape rules (enabled: true). This aligns with similar changes in other bundles (e.g., distro-full.yaml, paas-hosted.yaml). Please verify that enabling etcd scraping by default is the intended behavior for all deployment scenarios, and consider adding an in-line comment for future maintainers describing why this default was chosen.

packages/system/monitoring-agents/values.yaml (1)

363-366: Etcd Scrape Rules Default Set to False
A new scrapeRules section has been added with etcd.enabled set to false. This default contrasts with the configuration in the paas-full bundle where etcd scraping is enabled. Please double-check that this discrepancy is intentional—for example, if this values file is meant for environments where etcd scraping should be disabled by default. Adding a clarifying comment here would help avoid confusion.

@klinch0 klinch0 force-pushed the feature/add-flux-dashboards branch from 3838f2c to 8e78010 Compare February 6, 2025 12:43
@klinch0 klinch0 force-pushed the feature/add-flux-dashboards branch from 8e78010 to 4d76763 Compare February 6, 2025 12:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
dashboards/flux/flux-stats.json (1)

1371-1392: Overall JSON Schema and Integration

The JSON adheres to schemaVersion 40 and includes all necessary sections for a Grafana dashboard. Before deployment, it would be worthwhile to validate this file with Grafana’s JSON model viewer or a linting tool to catch any subtle schema mismatches that might arise.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3838f2c and 4d76763.

📒 Files selected for processing (4)
  • dashboards/flux/flux-control-plane.json (1 hunks)
  • dashboards/flux/flux-stats.json (1 hunks)
  • hack/download-dashboards.sh (1 hunks)
  • packages/extra/monitoring/dashboards.list (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/extra/monitoring/dashboards.list
  • hack/download-dashboards.sh
🔇 Additional comments (8)
dashboards/flux/flux-stats.json (4)

1-32: Annotation Consistency Check

The annotations block defines two annotation objects for alerts and Flux events. However, note that the first annotation uses a string value ("-- Grafana --") for its datasource while the second uses an object with "type" and "uid". For consistency (and easier maintenance), consider standardizing the datasource reference format across annotations.


33-37: Dashboard Global Settings

The basic dashboard settings (such as "editable", "fiscalYearStartMonth", "graphTooltip", "id", and "links") are set as expected. No issues are detected here.


38-1299: Panels Configuration Overview

This dashboard includes a wide variety of panels (stat, bargauge, table, timeseries) that cover key Flux-related metrics such as:

  • Reconciler Counts (e.g. Cluster Reconcilers, Failing Reconcilers),
  • Resource Sources (e.g. Kubernetes Manifests Sources, Failing Sources),
  • Operation Durations (e.g. Reconciler ops avg. duration, Source ops avg. duration), and
  • Readiness Tables.

Please verify that each Prometheus query (which uses variables like $namespace) returns the intended data. Also, double-check that grid positions and panel dimensions are balanced for your typical Grafana display resolutions.


1301-1370: Templating and Refresh Configuration

The global settings following the panels include the refresh rate (set to "30s") and the templating list that defines dynamic variables (e.g. "operator_namespace", "namespace", and "DS_PROMETHEUS"). Confirm that the queries and any regular expressions in the templating (if used) yield the correct and expected values in your environment.

dashboards/flux/flux-control-plane.json (4)

1-41: Annotations Block Review

The annotations section defines two entries – one for "Annotations & Alerts" and one for "flux events" – both using a datasource object. This is consistent within this file. Please verify that the tag filtering (e.g. using "tags": [ "flux" ]) meets your operational needs.


42-47: Dashboard Metadata Validation

The initial metadata (including "editable", "fiscalYearStartMonth", "graphTooltip", "id", and "links") is correctly provided. No concerns here.


48-1438: Panels and Visualization Setup

This dashboard defines a rich set of panels for visualizing the Flux control plane:

  • Controllers Panel with query sum(go_info{pod=~".*-controller-.*"}) for tracking controller pods.
  • Max Work Queue, Memory, and API Requests Panels that display key operational metrics.
  • Multiple timeseries and table panels capturing resource usage, CPU/Memory usage, reconciliation durations, and repository operations (for Git, OCI, Helm, Buckets, etc.).

Each panel’s Prometheus query and threshold configuration appear well thought out. Be sure to test that the variable substitutions (e.g. ${DS_PROMETHEUS} and $namespace) inject the correct values and that the queries return data as expected.


1439-1725: Global Settings and Templating

The final section of the file includes the time settings, refresh intervals (set here to "10s"), and a templating list that defines variables such as "DS_PROMETHEUS" and "namespace". The configuration seems complete and consistent with the overall monitoring system. Verify that the regular expression in the "namespace" variable (if applicable) correctly captures the desired data from your metrics.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 6, 2025
Copy link
Member

@kvaps kvaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kvaps kvaps merged commit 842d3e5 into aenix-io:main Feb 6, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants