fix(controller): requeue reconciliation for readiness gate polling and master election by davidcharbonnier · Pull Request #511 · dragonflydb/dragonfly-operator

davidcharbonnier · 2026-03-31T14:48:06Z

Fix three related bugs in the pod lifecycle controller where pods were dropped from reconciliation, all stemming from missing requeue returns in the Reconcile method:

(Bug: Readiness gate condition dragonflydb.io/replication-ready never transitions to True after replica sync completes #503) The readiness gate condition dragonflydb.io/replication-ready never transitions to True because the controller exits early when the pod is not yet ready, preventing the readiness gate polling code from ever running.
(enableReplicationReadinessGate: true causes deadlock during master election #497) Master election deadlocks on initial startup because no pod passes the health check (which requires the readiness gate to be satisfied), and the controller does not requeue when no healthy candidate is available.
(Pod left unconfigured (no role, no replication, no readiness gate condition) after rolling restart #508) Pods are left completely unconfigured after rolling restart because the controller does not re-enqueue pods that come up while loading RDB snapshots, missing the initial configuration window.

The fix:

Changes the !podReady bail-out to requeue after 5s instead of silently dropping the reconciliation
Adds a requeue when no healthy master candidate is found and the triggering pod is not yet ready (breaks the deadlock)
Adds a requeue after initial replication config if the triggering pod is still loading

Fixes #503, #497, #508

…d master election Fix three related bugs in the pod lifecycle controller where pods were dropped from reconciliation, all stemming from missing requeue returns in the Reconcile method: 1. (dragonflydb#503) The readiness gate condition `dragonflydb.io/replication-ready` never transitions to True because the controller exits early when the pod is not yet ready, preventing the readiness gate polling code from ever running. 2. (dragonflydb#497) Master election deadlocks on initial startup because no pod passes the health check (which requires the readiness gate to be satisfied), and the controller does not requeue when no healthy candidate is available. 3. (dragonflydb#508) Pods are left completely unconfigured after rolling restart because the controller does not re-enqueue pods that come up while loading RDB snapshots, missing the initial configuration window. The fix: - Changes the `!podReady` bail-out to requeue after 5s instead of silently dropping the reconciliation - Adds a requeue when no healthy master candidate is found and the triggering pod is not yet ready (breaks the deadlock) - Adds a requeue after initial replication config if the triggering pod is still loading Fixes dragonflydb#503, dragonflydb#497, dragonflydb#508 Signed-off-by: David Charbonnier <david.charbonnier@cloudimperiumgames.com>

davidcharbonnier force-pushed the fix/readiness-gate-requeue-and-master-election branch from 7735031 to 10a3e2e Compare April 1, 2026 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(controller): requeue reconciliation for readiness gate polling and master election#511

fix(controller): requeue reconciliation for readiness gate polling and master election#511
davidcharbonnier wants to merge 1 commit intodragonflydb:mainfrom
davidcharbonnier:fix/readiness-gate-requeue-and-master-election

davidcharbonnier commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidcharbonnier commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant