feat: replica-only snapshots, per-pod services, DragonflyCluster CRD, PodLifecycle fix by bocharov · Pull Request #481 · dragonflydb/dragonfly-operator

bocharov · 2026-03-15T16:10:07Z

Summary

This PR adds several features and a bug fix developed and battle-tested in production:

PodLifecycle deadlock fix — requeue on no-healthy-pod and pod-not-ready
Replica-only snapshot mode with staggered cron scheduling
Per-pod ClusterIP services for cross-cluster routing
DragonflyCluster CRD and controller for multi-shard cluster mode

1. fix(controller): requeue on no-healthy-pod and pod-not-ready in PodLifecycle

When getHealthyPod() finds no healthy pod (e.g. all pods still loading data from S3
snapshots), the PodLifecycle controller returned ctrl.Result{}, nil — silently
dropping the event with no requeue. If no future pod events arrive, the controller
never retries and master election never completes.

Similarly, when a pod is not ready yet, the controller drops the event without
requeuing. During rolling updates, allPodsHealthyAndHaveRole() waits forever for a
role label that never gets set, causing a deadlock.

Fix: requeue after 5 seconds in both cases.

2. feat(snapshot): add enableOnReplicaOnly mode with staggerInterval

Adds a new snapshot mode that offloads snapshot I/O from master to replicas,
preventing snapshot serialization from blocking write-path latency on the master.

New API fields on the Snapshot spec:

enableOnReplicaOnly — when true, only replicas run snapshot_cron; the master
never saves. On master restart it loads the latest replica snapshot from S3 then
re-syncs via Dragonfly replication.
staggerInterval — staggers snapshot schedules across replicas so they do not
all snapshot at the same moment. Each replica's cron is offset by
(rank × interval) from the base Cron schedule.

Controller changes:

Two-pass reconciliation in checkAndConfigureReplicas: pass 1 handles masters
and unassigned pods; pass 2 processes replicas in sorted name order so each gets
a stable rank for snapshot_cron staggering.
ensureMasterSnapshotCron / ensureReplicaSnapshotCron: defensive checks that
correct snapshot_cron drift on every reconciliation (guards against transient
CONFIG SET failures or operator restarts).
replicaOf: when enableOnReplicaOnly, defer snapshot_cron assignment to
ensureReplicaSnapshotCron (rank not yet known at SLAVE OF time).
replicaOfNoOne: clear snapshot_cron on master in replica-only mode.
replTakeover: update snapshot_cron when roles switch.

Resource generation:

Skip --snapshot_cron container arg when enableOnMasterOnly or
enableOnReplicaOnly is set; the operator sets it dynamically via CONFIG SET
to eliminate the startup window where pods could snapshot before the operator
configures them.
Validation: mutual exclusivity, staggerInterval requires enableOnReplicaOnly,
enableOnReplicaOnly requires ≥ 2 replicas.

Tests: unit tests for staggerCron and replicaCronForRank, e2e tests for
snapshot configuration validation.

3. feat(resources): add per-pod ClusterIP services and type transition handling

Each Dragonfly pod gets its own ClusterIP service named after the pod (e.g.
df-0, df-1) using the statefulset.kubernetes.io/pod-name label selector.
ClusterIPs are routable cross-cluster unlike pod IPs, making them suitable for
CLUSTER SLOTS responses and cross-cluster clients.
Handle headless ↔ ClusterIP service type transitions: spec.clusterIP is
immutable in Kubernetes, so the operator detects the mismatch and does a
delete+recreate instead of failing on update.

4. feat(controller): add DragonflyCluster CRD and controller

Adds a DragonflyCluster CRD and controller to manage Dragonfly cluster-mode
(multi-shard + replicas), including slot allocation and scale-out rebalancing.

DragonflyCluster API:

spec.shards: desired number of primary/master shards
spec.replicasPerShard: replicas per shard (excluding master)
spec.template: DragonflySpec applied to each shard
spec.rebalance: controls automatic slot rebalancing on scale-out

Controller features:

Provisions per-shard Dragonfly CRs with cluster_mode=yes
Assigns stable cluster node IDs (UUIDs) per pod
Builds and pushes DFLYCLUSTER CONFIG to all shard masters
Implements slot migration via DFLYCLUSTER SLOT-MIGRATION-STATUS
Per-shard snapshot dir to avoid S3 filename collisions
Configurable service DNS suffix via DRAGONFLY_CLUSTER_SERVICE_SUFFIX env var
Tolerates unready replicas during topology collection
Advertises per-pod ClusterIP service DNS in cluster config

Also includes RBAC rules, CRD manifests, sample YAMLs, and README documentation.

…fecycle When getHealthyPod() finds no healthy pod (e.g. all pods still loading data from S3 snapshots), the PodLifecycle controller returned ctrl.Result{}, nil — silently dropping the event with no requeue. If no future pod events arrive (pods already stable/Running), the controller never retries and master election never completes. Similarly, when a pod is not ready yet, the controller drops the event without requeuing. During rolling updates, the DragonflyReconciler's allPodsHealthyAndHaveRole() waits forever for a role label that never gets set, causing a deadlock. Fix both cases by requeuing after 5 seconds.

Add a new snapshot mode that offloads snapshot I/O from master to replicas, preventing snapshot serialization from blocking write-path latency on the master. New API fields on the Snapshot spec: - enableOnReplicaOnly: when true, only replicas run snapshot_cron; the master never saves. On master restart it loads the latest replica snapshot from S3 then re-syncs via Dragonfly replication. - staggerInterval: staggers snapshot schedules across replicas so they don't all snapshot at the same moment. Each replica's cron is offset by (rank * interval) from the base Cron schedule. Controller changes: - Two-pass reconciliation in checkAndConfigureReplicas: pass 1 handles masters and unassigned pods; pass 2 processes replicas in sorted name order so each gets a stable rank for snapshot_cron staggering. - ensureMasterSnapshotCron / ensureReplicaSnapshotCron: defensive checks that correct snapshot_cron drift on every reconciliation (guards against transient CONFIG SET failures or operator restarts). - replicaOf: when enableOnReplicaOnly, defer snapshot_cron assignment to ensureReplicaSnapshotCron (rank not yet known at SLAVE OF time). - replicaOfNoOne: clear snapshot_cron on master in replica-only mode. - replTakeover: update snapshot_cron when roles switch (re-enable on new master for enableOnMasterOnly; clear for enableOnReplicaOnly). Resource generation: - Skip --snapshot_cron container arg when enableOnMasterOnly or enableOnReplicaOnly is set; the operator sets it dynamically via CONFIG SET to eliminate the startup window where pods could snapshot before the operator configures them. - Validation: enableOnMasterOnly and enableOnReplicaOnly are mutually exclusive; staggerInterval requires enableOnReplicaOnly; enableOnReplicaOnly requires at least 2 replicas. Includes unit tests for staggerCron and replicaCronForRank.

…andling Add per-pod ClusterIP services for cross-cluster routing: - Each Dragonfly pod gets its own ClusterIP service named after the pod (e.g. df-0, df-1) using the statefulset.kubernetes.io/pod-name label selector. ClusterIPs are routable cross-cluster unlike pod IPs, making them suitable for CLUSTER SLOTS responses and cross-cluster clients. Handle headless <-> ClusterIP service type transitions: - spec.clusterIP is an immutable field in Kubernetes. When the desired and existing service disagree on headless vs ClusterIP (e.g. migrating from headless to per-pod ClusterIP services), the operator now detects this and does a delete+recreate instead of failing on update.

Add a DragonflyCluster CRD and controller to manage Dragonfly cluster-mode (multi-shard + replicas), including slot allocation and scale-out rebalancing. DragonflyCluster API: - spec.shards: desired number of primary/master shards - spec.replicasPerShard: replicas per shard (excluding master) - spec.template: DragonflySpec applied to each shard - spec.rebalance: controls automatic slot rebalancing on scale-out - status tracks per-shard slot ranges, conditions, and active migrations Controller features: - Provisions per-shard Dragonfly CRs with cluster_mode=yes - Assigns stable cluster node IDs (UUIDs) per pod - Builds and pushes DFLYCLUSTER CONFIG to all shard masters - Implements slot migration via DFLYCLUSTER SLOT-MIGRATION-STATUS - Per-shard snapshot dir to avoid S3 filename collisions - Configurable service DNS suffix via DRAGONFLY_CLUSTER_SERVICE_SUFFIX env var for cross-cluster routing - Tolerates unready replicas during topology collection (only masters are required to be ready) - Advertises per-pod ClusterIP service DNS in cluster config for cross-cluster client compatibility - Better error wrapping throughout for debuggability Also includes: - RBAC rules for dragonflyclusters and dragonflyclusters/status - CRD kustomization and sample manifests - DragonflyCluster controller registration in cmd/main.go - DeepCopy methods for all new types

- Regenerate config/crd/bases/ CRDs via controller-gen to include new enableOnReplicaOnly and staggerInterval fields in the Dragonfly CRD and the full DragonflyCluster CRD. - Regenerate manifests/crd.yaml and manifests/dragonfly-operator.yaml via kustomize to include both CRDs and updated RBAC. - Add e2e tests for snapshot configuration validation: - Mutual exclusivity of enableOnMasterOnly and enableOnReplicaOnly - enableOnReplicaOnly: verifies --snapshot_cron is NOT in StatefulSet container args (operator sets it dynamically via CONFIG SET) - enableOnMasterOnly: same verification

Update README.md with documentation for new features: - Snapshot modes: enableOnMasterOnly and enableOnReplicaOnly with staggerInterval, including YAML examples and constraints - DragonflyCluster CRD for multi-shard cluster mode with slot allocation, rebalancing, and cross-cluster routing configuration - Per-pod ClusterIP services for cross-cluster DNS routing - Updated feature list in the introduction

ashotland · 2026-03-16T07:19:37Z

Hi @bocharov - thanks for contributing

can you please split this pr to 4 prs, one for each of issues/features you mentioned.

it will be easier to review and discuss each one separately.

thanks!

ashotland · 2026-03-16T07:57:29Z

Hi @bocharov - also curious about 'battle-tested in production' - for how long are you running dragonfly in production?

what is the use case which made you require a multi sharded cluster ?

bocharov added 6 commits March 14, 2026 11:20

shitaoxai approved these changes Mar 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replica-only snapshots, per-pod services, DragonflyCluster CRD, PodLifecycle fix#481

feat: replica-only snapshots, per-pod services, DragonflyCluster CRD, PodLifecycle fix#481
bocharov wants to merge 6 commits intodragonflydb:mainfrom
bocharov:feat/snapshot-cluster-improvements

bocharov commented Mar 15, 2026

Uh oh!

ashotland commented Mar 16, 2026

Uh oh!

ashotland commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bocharov commented Mar 15, 2026

Summary

1. fix(controller): requeue on no-healthy-pod and pod-not-ready in PodLifecycle

2. feat(snapshot): add enableOnReplicaOnly mode with staggerInterval

3. feat(resources): add per-pod ClusterIP services and type transition handling

4. feat(controller): add DragonflyCluster CRD and controller

Uh oh!

ashotland commented Mar 16, 2026

Uh oh!

ashotland commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants