Skip to content

fix: select best replica#496

Open
ashotland wants to merge 1 commit intomainfrom
ari/selectBestReplica
Open

fix: select best replica#496
ashotland wants to merge 1 commit intomainfrom
ari/selectBestReplica

Conversation

@ashotland
Copy link
Copy Markdown
Contributor

Fix master election to prefer replicas with data over empty restarted masters

Context

When a master pod crashes and restarts empty (no snapshot or corrupted data), the current selectMasterCandidate picks the lowest-ordinal ready pod — which could be the empty restarted master. All replicas then SLAVE OF the empty pod and lose their data.

Key constraint: older Dragonfly versions reset slave_repl_offset to 0 after a replica disconnects, so we cannot rely on offset > 0 alone to determine if a pod has data. Instead, we must check the pod's role first: pods with role=replica label were actively replicating and have data.

Selection priority

  1. Prefer pods with role=replica label — they were actively replicating and have data
  2. Among replicas, prefer highest slave_repl_offset (most up-to-date)
  3. Tie-break by lowest pod ordinal
  4. Fall back to non-replica pods (no role or role=master) only if no replicas are ready

Files to modify

  1. internal/controller/util.go — add MasterCandidate struct, isBetterCandidate, update selectMasterCandidate
  2. internal/controller/util_test.go — update tests for new selection logic
  3. internal/controller/dragonfly_pod_lifecycle_controller.go — update call site to pass candidate builder
  4. internal/controller/dragonfly_instance.go — add getMasterCandidate helper

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates master election in the Dragonfly operator to avoid promoting an empty restarted master by preferring ready replicas (known to have data) and using replication offset as a secondary signal.

Changes:

  • Introduce MasterCandidate metadata and candidate comparison logic for deterministic master selection.
  • Update master-election call site to supply a candidate builder that queries INFO replication.
  • Expand unit tests to cover replica-preference and offset-based tie-breaking.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
internal/controller/util.go Adds MasterCandidate, isBetterCandidate, and updates selectMasterCandidate to prefer replicas and higher offsets.
internal/controller/util_test.go Updates/extends tests to validate the new candidate-selection priority rules.
internal/controller/dragonfly_pod_lifecycle_controller.go Passes a candidate-builder callback into selectMasterCandidate during failover selection.
internal/controller/dragonfly_instance.go Adds getMasterCandidate helper to derive IsReplica and slave_repl_offset via INFO replication.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +227 to 239
func selectMasterCandidate(pods []corev1.Pod, isReady func(*corev1.Pod) bool, getCandidate func(*corev1.Pod) MasterCandidate) *corev1.Pod {
var best *MasterCandidate

for i := range pods {
p := &pods[i]
if !isReady(p) {
continue
}

// Prefer Pod-0, then Pod-1, etc.
if bestCandidate == nil || getOrdinal(p.Name) < getOrdinal(bestCandidate.Name) {
bestCandidate = p
c := getCandidate(p)
if best == nil || isBetterCandidate(&c, best) {
best = &c
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selectMasterCandidate stores best as a pointer to the loop-local variable c (best = &c). While safe due to escape analysis, it causes a heap allocation per update and is harder to reason about. Consider tracking the best candidate by value (plus a boolean) to avoid allocations and improve readability.

Copilot uses AI. Check for mistakes.
Comment on lines +1117 to +1123
func (dfi *DragonflyInstance) getMasterCandidate(ctx context.Context, pod *corev1.Pod) MasterCandidate {
c := MasterCandidate{Pod: pod, IsReplica: isReplica(pod)}
if pod.Status.PodIP == "" {
return c
}
info, err := dfi.getRedisClient(pod.Status.PodIP).Info(ctx, "replication").Result()
if err != nil {
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMasterCandidate always performs an INFO replication call even for non-replica pods, but non-replicas are only used as a fallback when no ready replicas exist. Skipping the INFO call when IsReplica is false would reduce network calls and speed up elections in large clusters.

Copilot uses AI. Check for mistakes.
Comment on lines +1120 to +1131
return c
}
info, err := dfi.getRedisClient(pod.Status.PodIP).Info(ctx, "replication").Result()
if err != nil {
return c
}
data := parseInfoToMap(info)
if offsetStr, ok := data["slave_repl_offset"]; ok && offsetStr != "" {
if offset, err := strconv.ParseInt(offsetStr, 10, 64); err == nil {
c.Offset = offset
}
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMasterCandidate silently ignores INFO/parse failures and returns an offset of 0, which makes it hard to diagnose why elections are behaving unexpectedly (and can bias tie-breaks). Consider adding debug-level logging for INFO/parse errors and/or using a short per-call context timeout so master election can't block reconciliation for a long time when pods are slow/unreachable.

Suggested change
return c
}
info, err := dfi.getRedisClient(pod.Status.PodIP).Info(ctx, "replication").Result()
if err != nil {
return c
}
data := parseInfoToMap(info)
if offsetStr, ok := data["slave_repl_offset"]; ok && offsetStr != "" {
if offset, err := strconv.ParseInt(offsetStr, 10, 64); err == nil {
c.Offset = offset
}
}
// No IP means we cannot query Redis; log at debug level for diagnosability.
dfi.log.V(1).Info("skipping master candidate: pod has no IP", "pod", pod.Name)
return c
}
// Use a short per-call timeout so a slow/unreachable pod does not block reconciliation for long.
callCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
defer cancel()
info, err := dfi.getRedisClient(pod.Status.PodIP).Info(callCtx, "replication").Result()
if err != nil {
// Log at debug level so INFO/parse errors are visible during elections.
dfi.log.V(1).Info("failed to get replication INFO for master candidate",
"pod", pod.Name, "ip", pod.Status.PodIP, "error", err)
return c
}
data := parseInfoToMap(info)
offsetStr, ok := data["slave_repl_offset"]
if !ok || offsetStr == "" {
dfi.log.V(1).Info("slave_repl_offset not found in replication INFO for master candidate",
"pod", pod.Name, "ip", pod.Status.PodIP)
return c
}
offset, err := strconv.ParseInt(offsetStr, 10, 64)
if err != nil {
dfi.log.V(1).Info("failed to parse slave_repl_offset for master candidate",
"pod", pod.Name, "ip", pod.Status.PodIP, "value", offsetStr, "error", err)
return c
}
c.Offset = offset

Copilot uses AI. Check for mistakes.
Comment on lines +253 to +254
// Among same role, prefer higher replication offset.
if a.Offset != b.Offset {
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isBetterCandidate compares Offset even when both candidates are non-replicas. This conflicts with the documented selection priority (offset should only be considered among replicas) and can lead to unexpected behavior if a non-replica ever reports a non-zero slave_repl_offset. Consider only applying the offset comparison when IsReplica is true (or update the comment/priority to match the implemented behavior).

Suggested change
// Among same role, prefer higher replication offset.
if a.Offset != b.Offset {
// Among replicas, prefer higher replication offset.
if a.IsReplica && b.IsReplica && a.Offset != b.Offset {

Copilot uses AI. Check for mistakes.
@ashotland ashotland marked this pull request as ready for review March 31, 2026 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants