Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
732 changes: 732 additions & 0 deletions approval-controller-metric-collector/README.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Makefile for ApprovalRequest Controller

# Image settings
IMAGE_NAME ?= approval-request-controller
IMAGE_TAG ?= latest
REGISTRY ?=

# Build settings
GOOS ?= $(shell go env GOOS)
GOARCH ?= $(shell go env GOARCH)

# Tools
CONTROLLER_GEN_VERSION ?= v0.16.0
CONTROLLER_GEN = go run sigs.k8s.io/controller-tools/cmd/controller-gen@$(CONTROLLER_GEN_VERSION)

.PHONY: help
help: ## Display this help
@awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n make \033[36m<target>\033[0m\n"} /^[a-zA-Z_-]+:.*?##/ { printf " \033[36m%-15s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)

##@ Code Generation

.PHONY: manifests
manifests: ## Generate CRD manifests
$(CONTROLLER_GEN) crd paths="./apis/..." output:crd:artifacts:config=config/crd/bases

.PHONY: generate
generate: ## Generate DeepCopy code
$(CONTROLLER_GEN) object:headerFile="hack/boilerplate.go.txt" paths="./apis/..."

##@ Build

.PHONY: docker-build
docker-build: ## Build docker image
docker buildx build \
--file docker/approval-request-controller.Dockerfile \
--output=type=docker \
--platform=linux/$(GOARCH) \
--build-arg GOARCH=$(GOARCH) \
--tag $(IMAGE_NAME):$(IMAGE_TAG) \
--build-context kubefleet=.. \
..

.PHONY: docker-push
docker-push: ## Push docker image
docker push $(REGISTRY)$(IMAGE_NAME):$(IMAGE_TAG)

##@ Development

.PHONY: run
run: ## Run controller locally
cd .. && go run ./approval-request-controller/cmd/approvalrequestcontroller/main.go

##@ Deployment

.PHONY: install
install: ## Install helm chart
helm install approval-request-controller ./charts/approval-request-controller \
--namespace fleet-system \
--create-namespace \
--set image.repository=$(IMAGE_NAME) \
--set image.tag=$(IMAGE_TAG)

.PHONY: upgrade
upgrade: ## Upgrade helm chart
helm upgrade approval-request-controller ./charts/approval-request-controller \
--namespace fleet-system \
--set image.repository=$(IMAGE_NAME) \
--set image.tag=$(IMAGE_TAG)

.PHONY: uninstall
uninstall: ## Uninstall helm chart
helm uninstall approval-request-controller --namespace fleet-system

##@ Kind

.PHONY: kind-load
kind-load: docker-build ## Build and load image into kind cluster
kind load docker-image $(IMAGE_NAME):$(IMAGE_TAG) --name hub
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# ApprovalRequest Controller

The ApprovalRequest Controller is a standalone controller that runs on the **hub cluster** to automate approval decisions for staged updates based on workload health metrics.

## Overview

This controller is designed to be a standalone component that can run independently from the main kubefleet repository. It:
- Uses kubefleet v0.1.2 as an external dependency
- Includes its own APIs for MetricCollectorReport and WorkloadTracker
- Watches `ApprovalRequest` and `ClusterApprovalRequest` resources (from kubefleet)
- Creates `MetricCollector` resources on member clusters via ClusterResourcePlacement
- Monitors workload health via `MetricCollectorReport` objects
- Automatically approves requests when all tracked workloads are healthy
- Runs every 15 seconds to check health status

## Architecture

The controller is designed to run on the hub cluster and:
1. Deploys MetricCollector instances to member clusters using CRP
2. Collects health metrics from MetricCollectorReports
3. Compares metrics against WorkloadTracker specifications
4. Approves ApprovalRequests when all workloads are healthy

## Installation

### Prerequisites

The following CRDs must be installed on the hub cluster (installed by kubefleet hub-agent):
- `approvalrequests.placement.kubernetes-fleet.io`
- `clusterapprovalrequests.placement.kubernetes-fleet.io`
- `clusterresourceplacements.placement.kubernetes-fleet.io`
- `clusterresourceoverrides.placement.kubernetes-fleet.io`
- `clusterstagedupdateruns.placement.kubernetes-fleet.io`
- `stagedupdateruns.placement.kubernetes-fleet.io`

The following CRDs are installed by this chart:
- `metriccollectors.metric.kubernetes-fleet.io`
- `metriccollectorreports.metric.kubernetes-fleet.io`
- `workloadtrackers.metric.kubernetes-fleet.io`

### Install via Helm

```bash
# Build the image
make docker-build IMAGE_NAME=approval-request-controller IMAGE_TAG=latest

# Load into kind (if using kind)
kind load docker-image approval-request-controller:latest --name hub

# Install the chart
helm install approval-request-controller ./charts/approval-request-controller \
--namespace fleet-system \
--create-namespace
```

## Configuration

The controller watches for:
- `ApprovalRequest` (namespaced)
- `ClusterApprovalRequest` (cluster-scoped)

Both resources from kubefleet are monitored, and the controller creates `MetricCollector` resources on appropriate member clusters based on the staged update configuration.

### Health Check Interval

The controller checks workload health every **15 seconds**. This interval is configurable via the `reconcileInterval` parameter in the Helm chart.

## API Reference

### WorkloadTracker

`WorkloadTracker` is a cluster-scoped custom resource that defines which workloads the approval controller should monitor for health metrics before auto-approving staged rollouts.

#### Example: Single Workload

```yaml
apiVersion: metric.kubernetes-fleet.io/v1beta1
kind: WorkloadTracker
metadata:
name: sample-workload-tracker
workloads:
- name: sample-metric-app
namespace: test-ns
```

#### Example: Multiple Workloads

```yaml
apiVersion: metric.kubernetes-fleet.io/v1beta1
kind: WorkloadTracker
metadata:
name: multi-workload-tracker
workloads:
- name: frontend
namespace: production
- name: backend-api
namespace: production
- name: worker-service
namespace: production
```

#### Usage Notes

- **Cluster-scoped:** WorkloadTracker is a cluster-scoped resource, not namespaced
- **Optional:** If no WorkloadTracker exists, the controller will skip health checks and won't auto-approve
- **Single instance:** The controller expects one WorkloadTracker per cluster and uses the first one found
- **Health criteria:** All workloads listed must report healthy (metric value = 1.0) before approval
- **Prometheus metrics:** Each workload should expose `workload_health` metrics that the MetricCollector can query

For a complete example, see: [`./examples/workloadtracker/workloadtracker.yaml`](./examples/workloadtracker/workloadtracker.yaml)

## Additional Resources

- **Main Tutorial:** See [`../README.md`](../README.md) for a complete end-to-end tutorial on setting up automated staged rollouts with approval automation
- **Metric Collector:** See [`../metric-collector/README.md`](../metric-collector/README.md) for details on the metric collection component that runs on member clusters
- **KubeFleet Documentation:** [Azure/fleet](https://github.com/Azure/fleet) - Multi-cluster orchestration platform
- **Example Configurations:**
- [`./examples/workloadtracker/`](./examples/workloadtracker/) - WorkloadTracker resource examples
- [`./examples/stagedupdaterun/`](./examples/stagedupdaterun/) - Staged update configuration examples
- [`./examples/prometheus/`](./examples/prometheus/) - Prometheus deployment and configuration for metric collection
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
/*
Copyright 2025 The KubeFleet Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// Package v1alpha1 contains API Schema definitions for the placement v1beta1 API group
// +kubebuilder:object:generate=true
// +groupName=metric.kubernetes-fleet.io
package v1alpha1
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
/*
Copyright 2025 The KubeFleet Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// +kubebuilder:object:generate=true
// +groupName=metric.kubernetes-fleet.io
package v1alpha1

import (
"k8s.io/apimachinery/pkg/runtime/schema"
"sigs.k8s.io/controller-runtime/pkg/scheme"
)

var (
// GroupVersion is group version used to register these objects
GroupVersion = schema.GroupVersion{Group: "metric.kubernetes-fleet.io", Version: "v1alpha1"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arvind! Just a nit: I fear that the name (metric.kubernetes-fleet.io) might be a bit confusing.


// SchemeBuilder is used to add go types to the GroupVersionKind scheme
SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}

// AddToScheme adds the types in this group-version to the given scheme.
AddToScheme = SchemeBuilder.AddToScheme
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
/*
Copyright 2025 The KubeFleet Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// +genclient
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:scope="Namespaced",shortName=mcr,categories={fleet,fleet-metrics}
// +kubebuilder:storageversion
// +kubebuilder:printcolumn:JSONPath=`.status.workloadsMonitored`,name="Workloads",type=integer
// +kubebuilder:printcolumn:JSONPath=`.status.lastCollectionTime`,name="Last-Collection",type=date
// +kubebuilder:printcolumn:JSONPath=`.metadata.creationTimestamp`,name="Age",type=date

// MetricCollectorReport is created by the approval-request-controller on the hub cluster
// in the fleet-member-{clusterName} namespace. The metric-collector on the member cluster
// watches these reports and updates their status with collected metrics.
//
// Controller workflow:
// 1. Approval-controller creates MetricCollectorReport with spec on hub
// 2. Metric-collector watches MetricCollectorReport on hub (in fleet-member-{clusterName} namespace)
// 3. Metric-collector queries Prometheus on member cluster
// 4. Metric-collector updates MetricCollectorReport status on hub with collected metrics
//
// Namespace: fleet-member-{clusterName}
// Name: Matches the UpdateRun name
type MetricCollectorReport struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why doesn't this CR have spec and status? Feel like Conditions should be part of the Status and WorkloadsMonitored should be part of the spec

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetricCollectorReport is just a information source in the current implementation hence no desired state (spec) and no correspodning status

metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`

Spec MetricCollectorReportSpec `json:"spec,omitempty"`
Status MetricCollectorReportStatus `json:"status,omitempty"`
}

// MetricCollectorReportSpec defines the configuration for metric collection.
type MetricCollectorReportSpec struct {
// PrometheusURL is the URL of the Prometheus server on the member cluster
// Example: "http://prometheus.fleet-system.svc.cluster.local:9090"
PrometheusURL string `json:"prometheusUrl"`
}

// MetricCollectorReportStatus contains the collected metrics from the member cluster.
type MetricCollectorReportStatus struct {
// Conditions represent the latest available observations of the report's state.
// +optional
Conditions []metav1.Condition `json:"conditions,omitempty"`

// WorkloadsMonitored is the count of workloads being monitored.
// +optional
WorkloadsMonitored int32 `json:"workloadsMonitored,omitempty"`

// LastCollectionTime is when metrics were last collected on the member cluster.
// +optional
LastCollectionTime *metav1.Time `json:"lastCollectionTime,omitempty"`

// CollectedMetrics contains the most recent metrics from each workload.
// +optional
CollectedMetrics []WorkloadMetrics `json:"collectedMetrics,omitempty"`
}

// WorkloadMetrics represents metrics collected from a single workload pod.
type WorkloadMetrics struct {
// Namespace of the workload.
// +required
Namespace string `json:"namespace"`

// WorkloadName from the workload_health metric label.
// +required
WorkloadName string `json:"workloadName"`

// Health indicates if the workload is healthy (true=healthy, false=unhealthy).
// +required
Health bool `json:"health"`
}

// +kubebuilder:object:root=true

// MetricCollectorReportList contains a list of MetricCollectorReport.
type MetricCollectorReportList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []MetricCollectorReport `json:"items"`
}

func init() {
SchemeBuilder.Register(&MetricCollectorReport{}, &MetricCollectorReportList{})
}
Loading