Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ on:

env:
GO_VERSION: '1.24.9'
CERT_MANAGER_VERSION: 'v1.16.2'

jobs:
detect-noop:
Expand Down Expand Up @@ -125,6 +126,7 @@ jobs:
PROPERTY_PROVIDER: 'azure'
RESOURCE_SNAPSHOT_CREATION_MINIMUM_INTERVAL: ${{ matrix.resource-snapshot-creation-minimum-interval }}
RESOURCE_CHANGES_COLLECTION_DURATION: ${{ matrix.resource-changes-collection-duration }}
CERT_MANAGER_VERSION: ${{ env.CERT_MANAGER_VERSION }}

- name: Collect logs
if: always()
Expand Down
63 changes: 62 additions & 1 deletion charts/hub-agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,33 @@

## Install Chart

### Default Installation (Self-Signed Certificates)

```console
# Helm install with fleet-system namespace already created
helm install hub-agent ./charts/hub-agent/
```

### Installation with cert-manager

When using cert-manager for certificate management, install cert-manager as a prerequisite first:

```console
# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.16.2 \
--set crds.enabled=true

# Then install hub-agent with cert-manager enabled
helm install hub-agent ./charts/hub-agent --set useCertManager=true
```

This configures cert-manager to manage webhook certificates.

## Upgrade Chart

```console
Expand All @@ -32,6 +54,11 @@ _See [helm install](https://helm.sh/docs/helm/helm_install/) for command documen
| `affinity` | Node affinity for hub-agent pods | `{}` |
| `tolerations` | Tolerations for hub-agent pods | `[]` |
| `logVerbosity` | Log level (klog V logs) | `5` |
| `enableWebhook` | Enable webhook server | `true` |
| `webhookServiceName` | Webhook service name | `fleetwebhook` |
| `enableGuardRail` | Enable guard rail webhook configurations | `true` |
| `webhookClientConnectionType` | Connection type for webhook client (service or url) | `service` |
| `useCertManager` | Use cert-manager for webhook certificate management | `false` |
| `enableV1Beta1APIs` | Watch for v1beta1 APIs | `true` |
| `hubAPIQPS` | QPS for fleet-apiserver (not including events/node heartbeat) | `250` |
| `hubAPIBurst` | Burst for fleet-apiserver (not including events/node heartbeat) | `1000` |
Expand All @@ -41,4 +68,38 @@ _See [helm install](https://helm.sh/docs/helm/helm_install/) for command documen
| `MaxFleetSizeSupported` | Max number of member clusters supported | `100` |
| `resourceSnapshotCreationMinimumInterval` | The minimum interval at which resource snapshots could be created. | `30s` |
| `resourceChangesCollectionDuration` | The duration for collecting resource changes into one snapshot. | `15s` |
| `enableWorkload` | Enable kubernetes builtin workload to run in hub cluster. | `false` |
| `enableWorkload` | Enable kubernetes builtin workload to run in hub cluster. | `false` |

## Certificate Management

The hub-agent supports two modes for webhook certificate management:

### Automatic Certificate Generation (Default)

By default, the hub-agent generates certificates automatically at startup. This mode:
- Requires no external dependencies
- Works out of the box
- Certificates are valid for 10 years

### cert-manager (Optional)

When `useCertManager=true`, certificates are managed by cert-manager. This mode:
- Requires cert-manager to be installed as a prerequisite
- Handles certificate rotation automatically (90-day certificates)
- Follows industry-standard certificate management practices
- Suitable for production environments

To switch to cert-manager mode:
```console
# Install cert-manager first
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.16.2 \
--set crds.enabled=true

# Then install hub-agent with cert-manager enabled
helm install hub-agent ./charts/hub-agent --set useCertManager=true
```
62 changes: 62 additions & 0 deletions charts/hub-agent/templates/certificate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{{- if and .Values.enableWebhook .Values.useCertManager }}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: fleet-webhook-server-cert
namespace: {{ .Values.namespace }}
labels:
{{- include "hub-agent.labels" . | nindent 4 }}
spec:
# Secret name where cert-manager will store the certificate
secretName: fleet-webhook-server-cert

# Certificate duration (90 days is cert-manager's default and recommended)
duration: 2160h # 90 days

# Renew certificate 30 days before expiry
renewBefore: 720h # 30 days

# Subject configuration
subject:
organizations:
- KubeFleet

# Common name
commonName: fleet-webhook.{{ .Values.namespace }}.svc

# DNS names for the certificate
dnsNames:
- {{ .Values.webhookServiceName }}
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}.svc
- {{ .Values.webhookServiceName }}.{{ .Values.namespace }}.svc.cluster.local

# Issuer reference - using self-signed issuer
issuerRef:
name: fleet-selfsigned-issuer
kind: Issuer
group: cert-manager.io

# Private key configuration
privateKey:
algorithm: ECDSA
size: 256

# Key usages
usages:
- digital signature
- key encipherment
- server auth
---
# Self-signed issuer for generating the certificate
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: fleet-selfsigned-issuer
namespace: {{ .Values.namespace }}
labels:
{{- include "hub-agent.labels" . | nindent 4 }}
spec:
selfSigned: {}
{{- end }}
15 changes: 15 additions & 0 deletions charts/hub-agent/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ metadata:
labels:
{{- include "hub-agent.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "hub-agent.selectorLabels" . | nindent 6 }}
Expand All @@ -25,6 +26,7 @@ spec:
- --webhook-service-name={{ .Values.webhookServiceName }}
- --enable-guard-rail={{ .Values.enableGuardRail }}
- --enable-workload={{ .Values.enableWorkload }}
- --use-cert-manager={{ .Values.useCertManager }}
- --whitelisted-users=system:serviceaccount:fleet-system:hub-agent-sa
- --webhook-client-connection-type={{.Values.webhookClientConnectionType}}
- --v={{ .Values.logVerbosity }}
Expand Down Expand Up @@ -73,6 +75,19 @@ spec:
fieldPath: metadata.namespace
resources:
{{- toYaml .Values.resources | nindent 12 }}
{{- if .Values.useCertManager }}
volumeMounts:
- name: webhook-cert
mountPath: /tmp/k8s-webhook-server/serving-certs
readOnly: true
{{- end }}
{{- if .Values.useCertManager }}
volumes:
- name: webhook-cert
secret:
secretName: fleet-webhook-server-cert
defaultMode: 0644
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
Expand Down
8 changes: 6 additions & 2 deletions charts/hub-agent/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,17 @@ webhookServiceName: fleetwebhook
enableGuardRail: true
webhookClientConnectionType: service
enableWorkload: false
# useCertManager enables cert-manager for webhook certificate management
# When enabled, cert-manager will be installed as a dependency
# and a Certificate resource will be created
useCertManager: false

forceDeleteWaitTime: 15m0s
clusterUnhealthyThreshold: 3m0s
resourceSnapshotCreationMinimumInterval: 30s
resourceChangesCollectionDuration: 15s

namespace:
fleet-system
namespace: fleet-system

resources:
limits:
Expand Down
20 changes: 16 additions & 4 deletions cmd/hubagent/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ import (
"github.com/kubefleet-dev/kubefleet/cmd/hubagent/options"
"github.com/kubefleet-dev/kubefleet/cmd/hubagent/workload"
mcv1beta1 "github.com/kubefleet-dev/kubefleet/pkg/controllers/membercluster/v1beta1"
"github.com/kubefleet-dev/kubefleet/pkg/utils/validator"
"github.com/kubefleet-dev/kubefleet/pkg/webhook"
// +kubebuilder:scaffold:imports
)
Expand Down Expand Up @@ -156,18 +157,29 @@ func main() {

if opts.EnableWebhook {
whiteListedUsers := strings.Split(opts.WhiteListedUsers, ",")
if err := SetupWebhook(mgr, options.WebhookClientConnectionType(opts.WebhookClientConnectionType), opts.WebhookServiceName, whiteListedUsers, opts.EnableGuardRail, opts.EnableV1Beta1APIs, opts.DenyModifyMemberClusterLabels, opts.EnableWorkload); err != nil {
if err := SetupWebhook(mgr, options.WebhookClientConnectionType(opts.WebhookClientConnectionType), opts.WebhookServiceName, whiteListedUsers, opts.EnableGuardRail, opts.EnableV1Beta1APIs, opts.DenyModifyMemberClusterLabels, opts.EnableWorkload, opts.UseCertManager); err != nil {
klog.ErrorS(err, "unable to set up webhook")
exitWithErrorFunc()
}
}

ctx := ctrl.SetupSignalHandler()
if err := workload.SetupControllers(ctx, &wg, mgr, config, opts); err != nil {
klog.ErrorS(err, "unable to set up ready check")
klog.ErrorS(err, "unable to set up controllers")
exitWithErrorFunc()
}

// Add webhook readiness check AFTER controllers are set up (when ResourceInformer is initialized)
// This prevents webhook from accepting requests before discovery cache is populated
if opts.EnableWebhook {
// AddReadyzCheck adds additional readiness check instead of replacing the one registered earlier provided the name is different.
// Both registered checks need to pass for the manager to be considered ready.
if err := mgr.AddReadyzCheck("webhook-cache", webhook.ResourceInformerReadinessChecker(validator.ResourceInformer)); err != nil {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR stacks on top of an informer readiness check change because now that I have multiple replicas of webhook servers, it becomes likely that some webhook servers might start serving requests before the cache is synced

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still discussing the other PR #367

If the other PR doesn't merge first e2e tests won't pass for HA hub agent setup

klog.ErrorS(err, "unable to set up webhook readiness check")
exitWithErrorFunc()
}
}

// +kubebuilder:scaffold:builder

wg.Add(1)
Expand All @@ -188,9 +200,9 @@ func main() {
}

// SetupWebhook generates the webhook cert and then set up the webhook configurator.
func SetupWebhook(mgr manager.Manager, webhookClientConnectionType options.WebhookClientConnectionType, webhookServiceName string, whiteListedUsers []string, enableGuardRail, isFleetV1Beta1API bool, denyModifyMemberClusterLabels bool, enableWorkload bool) error {
func SetupWebhook(mgr manager.Manager, webhookClientConnectionType options.WebhookClientConnectionType, webhookServiceName string, whiteListedUsers []string, enableGuardRail, isFleetV1Beta1API bool, denyModifyMemberClusterLabels bool, enableWorkload bool, useCertManager bool) error {
// Generate self-signed key and crt files in FleetWebhookCertDir for the webhook server to start.
w, err := webhook.NewWebhookConfig(mgr, webhookServiceName, FleetWebhookPort, &webhookClientConnectionType, FleetWebhookCertDir, enableGuardRail, denyModifyMemberClusterLabels, enableWorkload)
w, err := webhook.NewWebhookConfig(mgr, webhookServiceName, FleetWebhookPort, &webhookClientConnectionType, FleetWebhookCertDir, enableGuardRail, denyModifyMemberClusterLabels, enableWorkload, useCertManager)
if err != nil {
klog.ErrorS(err, "fail to generate WebhookConfig")
return err
Expand Down
4 changes: 4 additions & 0 deletions cmd/hubagent/options/options.go
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,9 @@ type Options struct {
// EnableWorkload enables workload resources (pods and replicasets) to be created in the hub cluster.
// When set to true, the pod and replicaset validating webhooks are disabled.
EnableWorkload bool
// UseCertManager indicates whether to use cert-manager for webhook certificate management.
// When enabled, webhook certificates are managed by cert-manager instead of self-signed generation.
UseCertManager bool
// ResourceSnapshotCreationMinimumInterval is the minimum interval at which resource snapshots could be created.
// Whether the resource snapshot is created or not depends on the both ResourceSnapshotCreationMinimumInterval and ResourceChangesCollectionDuration.
ResourceSnapshotCreationMinimumInterval time.Duration
Expand Down Expand Up @@ -185,6 +188,7 @@ func (o *Options) AddFlags(flags *flag.FlagSet) {
flags.IntVar(&o.PprofPort, "pprof-port", 6065, "The port for pprof profiling.")
flags.BoolVar(&o.DenyModifyMemberClusterLabels, "deny-modify-member-cluster-labels", false, "If set, users not in the system:masters cannot modify member cluster labels.")
flags.BoolVar(&o.EnableWorkload, "enable-workload", false, "If set, workloads (pods and replicasets) can be created in the hub cluster. This disables the pod and replicaset validating webhooks.")
flags.BoolVar(&o.UseCertManager, "use-cert-manager", false, "If set, cert-manager will be used for webhook certificate management instead of self-signed certificates.")
flags.DurationVar(&o.ResourceSnapshotCreationMinimumInterval, "resource-snapshot-creation-minimum-interval", 30*time.Second, "The minimum interval at which resource snapshots could be created.")
flags.DurationVar(&o.ResourceChangesCollectionDuration, "resource-changes-collection-duration", 15*time.Second,
"The duration for collecting resource changes into one snapshot. The default is 15 seconds, which means that the controller will collect resource changes for 15 seconds before creating a resource snapshot.")
Expand Down
16 changes: 16 additions & 0 deletions pkg/utils/informer/informermanager.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ type Manager interface {
// GetNameSpaceScopedResources returns the list of namespace scoped resources we are watching.
GetNameSpaceScopedResources() []schema.GroupVersionResource

// GetAllResources returns the list of all resources (both cluster-scoped and namespace-scoped) we are watching.
GetAllResources() []schema.GroupVersionResource

// IsClusterScopedResources returns if a resource is cluster scoped.
IsClusterScopedResources(resource schema.GroupVersionKind) bool

Expand Down Expand Up @@ -224,6 +227,19 @@ func (s *informerManagerImpl) GetNameSpaceScopedResources() []schema.GroupVersio
return res
}

func (s *informerManagerImpl) GetAllResources() []schema.GroupVersionResource {
s.resourcesLock.RLock()
defer s.resourcesLock.RUnlock()

res := make([]schema.GroupVersionResource, 0, len(s.apiResources))
for _, resource := range s.apiResources {
if resource.isPresent {
res = append(res, resource.GroupVersionResource)
}
}
return res
}

func (s *informerManagerImpl) IsClusterScopedResources(gvk schema.GroupVersionKind) bool {
s.resourcesLock.RLock()
defer s.resourcesLock.RUnlock()
Expand Down
Loading
Loading