From 4dd51a798aacbaa5a534cfd4060cb379b2020022 Mon Sep 17 00:00:00 2001 From: Stefan Prodan Date: Thu, 4 Jul 2024 17:00:33 +0300 Subject: [PATCH] Update docs for flux-operator v0.7.0 Signed-off-by: Stefan Prodan --- docs/operator/fluxinstance.md | 44 ++++++++++++ docs/operator/fluxreport.md | 59 ++++++++++++++- docs/operator/index.md | 5 +- docs/operator/install.md | 23 +++++- docs/operator/monitoring.md | 131 +++++++++++++++++++++++++++++++++- mkdocs.yml | 2 +- 6 files changed, 255 insertions(+), 9 deletions(-) diff --git a/docs/operator/fluxinstance.md b/docs/operator/fluxinstance.md index c38c3c8..5ff2f49 100644 --- a/docs/operator/fluxinstance.md +++ b/docs/operator/fluxinstance.md @@ -276,6 +276,7 @@ spec: cluster: type: openshift multitenant: true + tenantDefaultServiceAccount: "flux" networkPolicy: true domain: "cluster.local" ``` @@ -292,6 +293,10 @@ The supported values are `kubernetes` (default), `openshift`, `aks`, `eks` and ` The `.spec.cluster.multitenant` field is optional and specifies whether to enable Flux [multi-tenancy lockdown](https://fluxcd.io/flux/installation/configuration/multitenancy/). +The `.spec.cluster.tenantDefaultServiceAccount` is optional and specifies the default +service account used by Flux when reconciling `Kustomization` and `HelmRelease` +resources found in the tenant namespaces. + #### Cluster network policy The `.spec.cluster.networkPolicy` field is optional and specifies whether to restrict network access @@ -659,3 +664,42 @@ Status: Last Applied Revision: v2.3.0@sha256:4cc5babdb1279ad0177bf513292deadbfa3f7b7c3da0be7fa53b39ab434f7219 Last Attempted Revision: v2.3.0@sha256:4cc5babdb1279ad0177bf513292deadbfa3f7b7c3da0be7fa53b39ab434f7219 ``` + +## FluxInstance Metrics + +The Flux Operator exports metrics for the FluxInstance resource. +These metrics are refreshed every time the operator reconciles the instance. + +Metrics: + +```text +flux_instance_info{uid, kind, name, exported_namespace, ready, suspended, registry, revision} +``` + +Labels: + +- `uid`: The Kubernetes unique identifier of the resource. +- `kind`: The kind of the resource (e.g. `FluxInstance`). +- `name`: The name of the resource (e.g. `flux`). +- `exported_namespace`: The namespace where the resource is deployed (e.g. `flux-system`). +- `ready`: The readiness status of the resource (e.g. `True`, `False` or `Unkown`). +- `reason`: The reason for the readiness status (e.g. `Progressing`, `BuildFailed`, `HealthCheckFailed`, etc.). +- `suspended`: The suspended status of the resource (e.g. `True` or `False`). +- `registry`: The container registry used by the instance (e.g. `ghcr.io/fluxcd`). +- `revision`: The Flux revision installed by the instance (e.g. `v2.3.0@sha256:75aa209c6a...`). + +Example: + +```text +flux_instance_info{ + exported_namespace="flux-system", + kind="FluxInstance", + name="flux", + ready="True", + reason="ReconciliationSucceeded", + registry="ghcr.io/fluxcd", + revision="v2.3.0@sha256:75aa209c6a2e25b97114ccf092246d02ab4363bc136edefc239d2a88da882b63", + suspended="False", + uid="16ca7202-9319-445b-99d0-617c25bda182" +} +``` diff --git a/docs/operator/fluxreport.md b/docs/operator/fluxreport.md index 70a249b..95cba8d 100644 --- a/docs/operator/fluxreport.md +++ b/docs/operator/fluxreport.md @@ -263,4 +263,61 @@ The FluxReport is automatically generated by the operator for the following cond The reconciliation behaviour can be configured using the following annotations: - `fluxcd.controlplane.io/reconcile`: Enable or disable the reconciliation loop. Default is `enabled`, set to `disabled` to pause the reconciliation. -- `fluxcd.controlplane.io/reconcileEvery`: Set the reconciliation interval. Default is `10m`. +- `fluxcd.controlplane.io/reconcileEvery`: Set the reconciliation interval. Default is `5m`. + +The default reconciliation interval of the report can be changed by setting +the `REPORTING_INTERVAL` environment variable in the operator deployment. + +## Flux Resource Metrics + +The Flux Operator exports metrics for all Flux resources found in the cluster. +These metrics are refreshed at the same time with the update of the FluxReport. + +Metrics: + +```text +flux_resource_info{uid, kind, name, exported_namespace, ready, suspended, ...} +``` + +Common labels: + +- `uid`: The Kubernetes unique identifier of the resource. +- `kind`: The kind of the resource (e.g. `GitRepository`, `Kustomization`, etc.). +- `name`: The name of the resource (e.g. `flux-system`). +- `exported_namespace`: The namespace of the resource (e.g. `flux-system`). +- `ready`: The readiness status of the resource (e.g. `True`, `False` or `Unkown`). +- `reason`: The reason for the readiness status (e.g. `Progressing`, `BuildFailed`, `HealthCheckFailed`, etc.). +- `suspended`: The suspended status of the resource (e.g. `True` or `False`). + +Specific labels per resource kind: + +| Resource Kind | Labels | +|-----------------------|-----------------------------------| +| Kustomization | `revision`, `source_name`, `path` | +| GitRepository | `revision`, `url`, `ref` | +| OCIRepository | `revision`, `url`, `ref` | +| Bucket | `revision`, `url`, `ref` | +| HelmRelease | `revision`, `source_name` | +| HelmChart | `revision`, `source_name` | +| HelmRepository | `revision`, `url` | +| Receiver | `url` | +| ImageRepository | `url` | +| ImagePolicy | `source_name` | +| ImageUpdateAutomation | `source_name` | + +Example: + +```text +flux_resource_info{ + exported_namespace="flux-system", + kind="Kustomization", + name="flux-system", + path="production/clusters", + ready="True", + reason="ReconciliationSucceeded", + revision="refs/heads/main@sha1:d3c6dfa21465cc540d214811f46694fee0ce700d", + source_name="flux-system", + suspended="False", + uid="359219f3-0793-4cf0-89a1-990ef1ac8098" +} +``` diff --git a/docs/operator/index.md b/docs/operator/index.md index 5562e01..d4ed41c 100644 --- a/docs/operator/index.md +++ b/docs/operator/index.md @@ -3,7 +3,10 @@ The [Flux Operator](https://github.com/controlplaneio-fluxcd/flux-operator) is a Kubernetes CRD controller that manages the lifecycle of CNCF Flux and the ControlPlane enterprise distribution. -The operator provides first-class support for running Flux in production + +The operator offers an alternative to the Flux Bootstrap procedure, it +removes the operational burden of managing Flux across fleets of clusters +and provides first-class support for running Flux in production on OpenShift, Amazon EKS, Azure AKS and Google GKE. ## Features diff --git a/docs/operator/install.md b/docs/operator/install.md index f0fb1d3..e45a07f 100644 --- a/docs/operator/install.md +++ b/docs/operator/install.md @@ -28,14 +28,31 @@ Installing the Flux Operator with Terraform is possible using the ```hcl resource "helm_release" "flux_operator" { - name = "flux-operator" + name = "flux-operator" + namespace = "flux-system" + repository = "oci://ghcr.io/controlplaneio-fluxcd/charts" + chart = "flux-operator" + create_namespace = true +} + +resource "helm_release" "flux_instance" { + depends_on = [helm_release.flux_operator] + + name = "flux" namespace = "flux-system" repository = "oci://ghcr.io/controlplaneio-fluxcd/charts" - chart = "flux-operator" - create_namespace = true + chart = "flux-instance" + + values = [ + file("values/components.yaml") + ] } ``` +For more information of how to configure the Flux instance with Terraform, +see the Flux Operator +[terraform module example](https://github.com/controlplaneio-fluxcd/flux-operator/tree/main/config/terraform). + ### Operator Lifecycle Manager The Flux Operator can be installed on OpenShift using the bundle published on OperatorHub diff --git a/docs/operator/monitoring.md b/docs/operator/monitoring.md index 629a763..24d5e89 100644 --- a/docs/operator/monitoring.md +++ b/docs/operator/monitoring.md @@ -1,5 +1,10 @@ # Flux Monitoring and Reporting +The Flux Operator supervises the Flux controllers and provides a unified view +of all the Flux resources that define the GitOps workflows for the target cluster. +The operator generates reports, emits events, and exports Prometheus metrics +to help with monitoring and troubleshooting Flux. + ## Flux Status Reporting The Flux Operator automatically generates a report that reflects the observed state of the Flux @@ -9,13 +14,20 @@ the Flux distribution details, reconcilers statistics, cluster sync status and m The report is generated as a custom resource of kind `FluxReport`, named `flux`, located in the same namespace where the operator is running. +!!! tip "Flux installation method" + + The report is available no matter the tool used to install Flux, + be it the `flux` CLI, Terraform, Helm or the Flux Operator itself. + For the report to be accurate, the operator must be running + in the same namespace where the Flux controllers are deployed. + To view the report in YAML format run: ```shell kubectl -n flux-system get fluxreport/flux -o yaml ``` -The operator updates the report at regular intervals, by default every 10 minutes. +The operator updates the report at regular intervals, by default every five minutes. To manually trigger the reconciliation of the report, run: ```shell @@ -29,10 +41,123 @@ in the [Flux Report API documentation](fluxreport.md). ## Flux Instance Events The Flux Operator emits events to the Kubernetes API server to report on the status of the Flux -instance. The events are useful to monitor the Flux lifecycle and troubleshoot issues. +instance. The events are useful to monitor the Flux lifecycle and troubleshoot upgrade issues. To list the events related to the Flux instance, run: ```shell -kubectl -n flux-system events for fluxinsance/flux +kubectl -n flux-system events --for fluxinstance/flux +``` + +## Prometheus Metrics + +The Flux Operator exports metrics in the Prometheus format for monitoring +and alerting purposes. The metrics are exposed inside the cluster by the +`flux-operator` Kubernetes Service on the `8080` port. + +On clusters where the Prometheus Operator is installed, the metrics can be scraped +by creating a `ServiceMonitor` resource as follows: + +```yaml +apiVersion: monitoring.coreos.com/v1 +kind: ServiceMonitor +metadata: + name: flux-operator + namespace: flux-system + labels: + release: kube-prometheus-stack +spec: + namespaceSelector: + matchNames: + - flux-system + selector: + matchLabels: + app.kubernetes.io/name: flux-operator + endpoints: + - targetPort: 8080 + path: /metrics + interval: 60s + scrapeTimeout: 30s +``` + +!!! tip "Helm Chart" + + The Flux Operator [Helm chart](install.md#helm) includes a `ServiceMonitor` resource that can + be enabled by setting the `serviceMonitor.create` value to `true`. + +On clusters with Prometheus auto-discovery enabled, the metrics are automatically scraped +from the `flux-operator` pods that have the `prometheus.io/scrape: "true"` annotation. + +### Flux Instance Metrics + +The Flux Operator exports metrics for the [FluxInstance](fluxinstance.md) resource. +These metrics are refreshed every time the operator reconciles the instance. + +Metrics: + +```text +flux_instance_info{uid, kind, name, exported_namespace, ready, suspended, registry, revision} ``` + +Labels: + +- `uid`: The Kubernetes unique identifier of the resource. +- `kind`: The kind of the resource (e.g. `FluxInstance`). +- `name`: The name of the resource (e.g. `flux`). +- `exported_namespace`: The namespace where the resource is deployed (e.g. `flux-system`). +- `ready`: The readiness status of the resource (e.g. `True`, `False` or `Unkown`). +- `reason`: The reason for the readiness status (e.g. `Progressing`, `BuildFailed`, `HealthCheckFailed`, etc.). +- `suspended`: The suspended status of the resource (e.g. `True` or `False`). +- `registry`: The container registry used by the instance (e.g. `ghcr.io/fluxcd`). +- `revision`: The Flux revision installed by the instance (e.g. `v2.3.0@sha256:75aa209c6a...`). + +### Flux Resource Metrics + +The Flux Operator exports metrics for all Flux resources found in the cluster. +These metrics are refreshed at the same time with the update of the [FluxReport](fluxreport.md). + +Metrics: + +```text +flux_resource_info{uid, kind, name, exported_namespace, ready, suspended, ...} +``` + +Common labels: + +- `uid`: The Kubernetes unique identifier of the resource. +- `kind`: The kind of the resource (e.g. `GitRepository`, `Kustomization`, etc.). +- `name`: The name of the resource (e.g. `flux-system`). +- `exported_namespace`: The namespace of the resource (e.g. `flux-system`). +- `ready`: The readiness status of the resource (e.g. `True`, `False` or `Unkown`). +- `reason`: The reason for the readiness status (e.g. `Progressing`, `BuildFailed`, `HealthCheckFailed`, etc.). +- `suspended`: The suspended status of the resource (e.g. `True` or `False`). + +Specific labels per resource kind: + +| Resource Kind | Labels | +|-----------------------|-----------------------------------| +| Kustomization | `revision`, `source_name`, `path` | +| GitRepository | `revision`, `url`, `ref` | +| OCIRepository | `revision`, `url`, `ref` | +| Bucket | `revision`, `url`, `ref` | +| HelmRelease | `revision`, `source_name` | +| HelmChart | `revision`, `source_name` | +| HelmRepository | `revision`, `url` | +| Receiver | `url` | +| ImageRepository | `url` | +| ImagePolicy | `source_name` | +| ImageUpdateAutomation | `source_name` | + +### Controller Runtime Metrics + +The Flux Operator exports Kubernetes +[controller runtime metrics](https://book.kubebuilder.io/reference/metrics-reference) +and Go runtime metrics. + +Relevant metrics for troubleshooting: + +- `controller_runtime_reconcile_errors_total{controller}`: Total number of reconciliation errors per controller. +- `rest_client_requests_total{code, method}`: Number of Kubernetes API requests, partitioned by status code and method. +- `go_memstats_alloc_bytes`: Number of bytes allocated and still in use. +- `go_goroutines`: Number of goroutines that currently exist. +- `workqueue_longest_running_processor_seconds`: Longest time a workqueue item has been processed. diff --git a/mkdocs.yml b/mkdocs.yml index 8b2014a..0c0f436 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -106,8 +106,8 @@ nav: - Flux Cluster Sync: operator/flux-sync.md - Flux Customization: operator/flux-kustomize.md - Guides: + - Flux Monitoring: operator/monitoring.md - Bootstrap Migration: operator/flux-bootstrap-migration.md - - Monitoring and Reporting: operator/monitoring.md - API Reference: - Flux Instance: operator/fluxinstance.md - Flux Report: operator/fluxreport.md