Skip to content

Commit

Permalink
Update docs for flux-operator v0.7.0
Browse files Browse the repository at this point in the history
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
  • Loading branch information
stefanprodan committed Jul 4, 2024
1 parent 610de20 commit 4dd51a7
Show file tree
Hide file tree
Showing 6 changed files with 255 additions and 9 deletions.
44 changes: 44 additions & 0 deletions docs/operator/fluxinstance.md
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,7 @@ spec:
cluster:
type: openshift
multitenant: true
tenantDefaultServiceAccount: "flux"
networkPolicy: true
domain: "cluster.local"
```
Expand All @@ -292,6 +293,10 @@ The supported values are `kubernetes` (default), `openshift`, `aks`, `eks` and `
The `.spec.cluster.multitenant` field is optional and specifies whether to enable Flux
[multi-tenancy lockdown](https://fluxcd.io/flux/installation/configuration/multitenancy/).

The `.spec.cluster.tenantDefaultServiceAccount` is optional and specifies the default
service account used by Flux when reconciling `Kustomization` and `HelmRelease`
resources found in the tenant namespaces.

#### Cluster network policy

The `.spec.cluster.networkPolicy` field is optional and specifies whether to restrict network access
Expand Down Expand Up @@ -659,3 +664,42 @@ Status:
Last Applied Revision: v2.3.0@sha256:4cc5babdb1279ad0177bf513292deadbfa3f7b7c3da0be7fa53b39ab434f7219
Last Attempted Revision: v2.3.0@sha256:4cc5babdb1279ad0177bf513292deadbfa3f7b7c3da0be7fa53b39ab434f7219
```

## FluxInstance Metrics

The Flux Operator exports metrics for the FluxInstance resource.
These metrics are refreshed every time the operator reconciles the instance.

Metrics:

```text
flux_instance_info{uid, kind, name, exported_namespace, ready, suspended, registry, revision}
```

Labels:

- `uid`: The Kubernetes unique identifier of the resource.
- `kind`: The kind of the resource (e.g. `FluxInstance`).
- `name`: The name of the resource (e.g. `flux`).
- `exported_namespace`: The namespace where the resource is deployed (e.g. `flux-system`).
- `ready`: The readiness status of the resource (e.g. `True`, `False` or `Unkown`).
- `reason`: The reason for the readiness status (e.g. `Progressing`, `BuildFailed`, `HealthCheckFailed`, etc.).
- `suspended`: The suspended status of the resource (e.g. `True` or `False`).
- `registry`: The container registry used by the instance (e.g. `ghcr.io/fluxcd`).
- `revision`: The Flux revision installed by the instance (e.g. `v2.3.0@sha256:75aa209c6a...`).

Example:

```text
flux_instance_info{
exported_namespace="flux-system",
kind="FluxInstance",
name="flux",
ready="True",
reason="ReconciliationSucceeded",
registry="ghcr.io/fluxcd",
revision="v2.3.0@sha256:75aa209c6a2e25b97114ccf092246d02ab4363bc136edefc239d2a88da882b63",
suspended="False",
uid="16ca7202-9319-445b-99d0-617c25bda182"
}
```
59 changes: 58 additions & 1 deletion docs/operator/fluxreport.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,4 +263,61 @@ The FluxReport is automatically generated by the operator for the following cond
The reconciliation behaviour can be configured using the following annotations:

- `fluxcd.controlplane.io/reconcile`: Enable or disable the reconciliation loop. Default is `enabled`, set to `disabled` to pause the reconciliation.
- `fluxcd.controlplane.io/reconcileEvery`: Set the reconciliation interval. Default is `10m`.
- `fluxcd.controlplane.io/reconcileEvery`: Set the reconciliation interval. Default is `5m`.

The default reconciliation interval of the report can be changed by setting
the `REPORTING_INTERVAL` environment variable in the operator deployment.

## Flux Resource Metrics

The Flux Operator exports metrics for all Flux resources found in the cluster.
These metrics are refreshed at the same time with the update of the FluxReport.

Metrics:

```text
flux_resource_info{uid, kind, name, exported_namespace, ready, suspended, ...}
```

Common labels:

- `uid`: The Kubernetes unique identifier of the resource.
- `kind`: The kind of the resource (e.g. `GitRepository`, `Kustomization`, etc.).
- `name`: The name of the resource (e.g. `flux-system`).
- `exported_namespace`: The namespace of the resource (e.g. `flux-system`).
- `ready`: The readiness status of the resource (e.g. `True`, `False` or `Unkown`).
- `reason`: The reason for the readiness status (e.g. `Progressing`, `BuildFailed`, `HealthCheckFailed`, etc.).
- `suspended`: The suspended status of the resource (e.g. `True` or `False`).

Specific labels per resource kind:

| Resource Kind | Labels |
|-----------------------|-----------------------------------|
| Kustomization | `revision`, `source_name`, `path` |
| GitRepository | `revision`, `url`, `ref` |
| OCIRepository | `revision`, `url`, `ref` |
| Bucket | `revision`, `url`, `ref` |
| HelmRelease | `revision`, `source_name` |
| HelmChart | `revision`, `source_name` |
| HelmRepository | `revision`, `url` |
| Receiver | `url` |
| ImageRepository | `url` |
| ImagePolicy | `source_name` |
| ImageUpdateAutomation | `source_name` |

Example:

```text
flux_resource_info{
exported_namespace="flux-system",
kind="Kustomization",
name="flux-system",
path="production/clusters",
ready="True",
reason="ReconciliationSucceeded",
revision="refs/heads/main@sha1:d3c6dfa21465cc540d214811f46694fee0ce700d",
source_name="flux-system",
suspended="False",
uid="359219f3-0793-4cf0-89a1-990ef1ac8098"
}
```
5 changes: 4 additions & 1 deletion docs/operator/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@
The [Flux Operator](https://github.com/controlplaneio-fluxcd/flux-operator)
is a Kubernetes CRD controller that manages
the lifecycle of CNCF Flux and the ControlPlane enterprise distribution.
The operator provides first-class support for running Flux in production

The operator offers an alternative to the Flux Bootstrap procedure, it
removes the operational burden of managing Flux across fleets of clusters
and provides first-class support for running Flux in production
on OpenShift, Amazon EKS, Azure AKS and Google GKE.

## Features
Expand Down
23 changes: 20 additions & 3 deletions docs/operator/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,31 @@ Installing the Flux Operator with Terraform is possible using the

```hcl
resource "helm_release" "flux_operator" {
name = "flux-operator"
name = "flux-operator"
namespace = "flux-system"
repository = "oci://ghcr.io/controlplaneio-fluxcd/charts"
chart = "flux-operator"
create_namespace = true
}
resource "helm_release" "flux_instance" {
depends_on = [helm_release.flux_operator]
name = "flux"
namespace = "flux-system"
repository = "oci://ghcr.io/controlplaneio-fluxcd/charts"
chart = "flux-operator"
create_namespace = true
chart = "flux-instance"
values = [
file("values/components.yaml")
]
}
```

For more information of how to configure the Flux instance with Terraform,
see the Flux Operator
[terraform module example](https://github.com/controlplaneio-fluxcd/flux-operator/tree/main/config/terraform).

### Operator Lifecycle Manager

The Flux Operator can be installed on OpenShift using the bundle published on OperatorHub
Expand Down
131 changes: 128 additions & 3 deletions docs/operator/monitoring.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Flux Monitoring and Reporting

The Flux Operator supervises the Flux controllers and provides a unified view
of all the Flux resources that define the GitOps workflows for the target cluster.
The operator generates reports, emits events, and exports Prometheus metrics
to help with monitoring and troubleshooting Flux.

## Flux Status Reporting

The Flux Operator automatically generates a report that reflects the observed state of the Flux
Expand All @@ -9,13 +14,20 @@ the Flux distribution details, reconcilers statistics, cluster sync status and m
The report is generated as a custom resource of kind `FluxReport`, named `flux`,
located in the same namespace where the operator is running.

!!! tip "Flux installation method"

The report is available no matter the tool used to install Flux,
be it the `flux` CLI, Terraform, Helm or the Flux Operator itself.
For the report to be accurate, the operator must be running
in the same namespace where the Flux controllers are deployed.

To view the report in YAML format run:

```shell
kubectl -n flux-system get fluxreport/flux -o yaml
```

The operator updates the report at regular intervals, by default every 10 minutes.
The operator updates the report at regular intervals, by default every five minutes.
To manually trigger the reconciliation of the report, run:

```shell
Expand All @@ -29,10 +41,123 @@ in the [Flux Report API documentation](fluxreport.md).
## Flux Instance Events

The Flux Operator emits events to the Kubernetes API server to report on the status of the Flux
instance. The events are useful to monitor the Flux lifecycle and troubleshoot issues.
instance. The events are useful to monitor the Flux lifecycle and troubleshoot upgrade issues.

To list the events related to the Flux instance, run:

```shell
kubectl -n flux-system events for fluxinsance/flux
kubectl -n flux-system events --for fluxinstance/flux
```

## Prometheus Metrics

The Flux Operator exports metrics in the Prometheus format for monitoring
and alerting purposes. The metrics are exposed inside the cluster by the
`flux-operator` Kubernetes Service on the `8080` port.

On clusters where the Prometheus Operator is installed, the metrics can be scraped
by creating a `ServiceMonitor` resource as follows:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: flux-operator
namespace: flux-system
labels:
release: kube-prometheus-stack
spec:
namespaceSelector:
matchNames:
- flux-system
selector:
matchLabels:
app.kubernetes.io/name: flux-operator
endpoints:
- targetPort: 8080
path: /metrics
interval: 60s
scrapeTimeout: 30s
```
!!! tip "Helm Chart"
The Flux Operator [Helm chart](install.md#helm) includes a `ServiceMonitor` resource that can
be enabled by setting the `serviceMonitor.create` value to `true`.

On clusters with Prometheus auto-discovery enabled, the metrics are automatically scraped
from the `flux-operator` pods that have the `prometheus.io/scrape: "true"` annotation.
### Flux Instance Metrics
The Flux Operator exports metrics for the [FluxInstance](fluxinstance.md) resource.
These metrics are refreshed every time the operator reconciles the instance.
Metrics:
```text
flux_instance_info{uid, kind, name, exported_namespace, ready, suspended, registry, revision}
```

Labels:

- `uid`: The Kubernetes unique identifier of the resource.
- `kind`: The kind of the resource (e.g. `FluxInstance`).
- `name`: The name of the resource (e.g. `flux`).
- `exported_namespace`: The namespace where the resource is deployed (e.g. `flux-system`).
- `ready`: The readiness status of the resource (e.g. `True`, `False` or `Unkown`).
- `reason`: The reason for the readiness status (e.g. `Progressing`, `BuildFailed`, `HealthCheckFailed`, etc.).
- `suspended`: The suspended status of the resource (e.g. `True` or `False`).
- `registry`: The container registry used by the instance (e.g. `ghcr.io/fluxcd`).
- `revision`: The Flux revision installed by the instance (e.g. `v2.3.0@sha256:75aa209c6a...`).

### Flux Resource Metrics

The Flux Operator exports metrics for all Flux resources found in the cluster.
These metrics are refreshed at the same time with the update of the [FluxReport](fluxreport.md).

Metrics:

```text
flux_resource_info{uid, kind, name, exported_namespace, ready, suspended, ...}
```

Common labels:

- `uid`: The Kubernetes unique identifier of the resource.
- `kind`: The kind of the resource (e.g. `GitRepository`, `Kustomization`, etc.).
- `name`: The name of the resource (e.g. `flux-system`).
- `exported_namespace`: The namespace of the resource (e.g. `flux-system`).
- `ready`: The readiness status of the resource (e.g. `True`, `False` or `Unkown`).
- `reason`: The reason for the readiness status (e.g. `Progressing`, `BuildFailed`, `HealthCheckFailed`, etc.).
- `suspended`: The suspended status of the resource (e.g. `True` or `False`).

Specific labels per resource kind:

| Resource Kind | Labels |
|-----------------------|-----------------------------------|
| Kustomization | `revision`, `source_name`, `path` |
| GitRepository | `revision`, `url`, `ref` |
| OCIRepository | `revision`, `url`, `ref` |
| Bucket | `revision`, `url`, `ref` |
| HelmRelease | `revision`, `source_name` |
| HelmChart | `revision`, `source_name` |
| HelmRepository | `revision`, `url` |
| Receiver | `url` |
| ImageRepository | `url` |
| ImagePolicy | `source_name` |
| ImageUpdateAutomation | `source_name` |

### Controller Runtime Metrics

The Flux Operator exports Kubernetes
[controller runtime metrics](https://book.kubebuilder.io/reference/metrics-reference)
and Go runtime metrics.

Relevant metrics for troubleshooting:

- `controller_runtime_reconcile_errors_total{controller}`: Total number of reconciliation errors per controller.
- `rest_client_requests_total{code, method}`: Number of Kubernetes API requests, partitioned by status code and method.
- `go_memstats_alloc_bytes`: Number of bytes allocated and still in use.
- `go_goroutines`: Number of goroutines that currently exist.
- `workqueue_longest_running_processor_seconds`: Longest time a workqueue item has been processed.
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ nav:
- Flux Cluster Sync: operator/flux-sync.md
- Flux Customization: operator/flux-kustomize.md
- Guides:
- Flux Monitoring: operator/monitoring.md
- Bootstrap Migration: operator/flux-bootstrap-migration.md
- Monitoring and Reporting: operator/monitoring.md
- API Reference:
- Flux Instance: operator/fluxinstance.md
- Flux Report: operator/fluxreport.md
Expand Down

0 comments on commit 4dd51a7

Please sign in to comment.