ResourceVersion conflicts block pod rollout due to Patroni annotation updates

## Overview

The Postgres Operator encounters frequent `ResourceVersion` conflicts when attempting to delete pods during rollout operations. The error occurs in `instance.go:876` within the `rolloutInstance()` function when the operator tries to delete a pod with a stale ResourceVersion due to Patroni's continuous annotation updates.

**Root Cause:** Patroni updates the pod's `status` annotation every ~10 seconds (based on `loop_wait` config) to track cluster state (`xlog_location`, `role`, `replication_state`, etc.). On active databases with frequent writes, this causes the pod's ResourceVersion to increment constantly. When the operator attempts to delete a pod during rollout using `client.Preconditions` with a ResourceVersion check, the precondition fails because Patroni has updated the annotation in the meantime.

**Impact:** 
- Pod rollouts fail to complete
- PostgresCluster `status.instances[].updatedReplicas` field remains empty
- Clusters show as `2//2` instead of `2/2/2` in status
- Pod operations and replication continue normally (cosmetic status issue, but blocks intentional rollouts)

## Environment

- **Platform**: Kubernetes
- **Platform Version**: Unknown (1.2x+)
- **PGO Image Tag**: `5.8.6`
- **Postgres Version**: 15
- **Storage**: Cloud provider persistent volumes
- **Patroni Configuration**: `loop_wait: 10`, `ttl: 30`, `synchronous_mode: true`

## Steps to Reproduce

### REPRO

1. Deploy a PostgresCluster with multiple instances (HA setup with Patroni)
2. Run an active workload generating frequent database writes (updates `xlog_location` continuously)
3. Trigger a pod rollout by updating the PostgresCluster spec (e.g., change resource limits, update image)
4. Observe operator logs for ResourceVersion conflicts

The issue is more pronounced on:
- Databases with high transaction rates
- Clusters with default Patroni `loop_wait: 10` seconds
- Rollouts taking longer than one Patroni loop cycle

### EXPECTED

1. Operator successfully deletes pod with UID check
2. StatefulSet recreates pod with new template
3. `status.instances[].updatedReplicas` updates correctly
4. Rollout completes without errors

### ACTUAL

1. Operator fails to delete pod with error:
```
Operation cannot be fulfilled on Pod "...": the ResourceVersion in the precondition (682170681) does not match the ResourceVersion in record (682171386). The object might have been modified
```
2. Pod is NOT deleted (rollout blocked)
3. Status field `updatedReplicas` remains empty
4. Error repeats on every reconciliation attempt

## Logs

### Operator Error Logs

```
time="2026-02-23T11:44:13Z" level=error msg="Reconciler error" 
PostgresCluster=postgres-51e9e197-1ca5-4ecf-a6a2-4d7a57ff5572/db-51e9e197-1ca5-4ecf-a6a2-4d7a57ff5572 
controller=postgrescluster 
controllerGroup=postgres-operator.crunchydata.com 
controllerKind=PostgresCluster 
error="Operation cannot be fulfilled on Pod \"db-51e9e197-1ca5-4ecf-a6a2-4d7a57ff5572-inst-2f2r-0\": the ResourceVersion in the precondition (682170681) does not match the ResourceVersion in record (682171386). The object might have been modified" 
file="internal/controller/postgrescluster/instance.go:876" 
func="postgrescluster.(*Reconciler).rolloutInstance" 
name=db-51e9e197-1ca5-4ecf-a6a2-4d7a57ff5572 
namespace=postgres-51e9e197-1ca5-4ecf-a6a2-4d7a57ff5572 
reconcileID=8e87f130-56a3-427c-bdea-b2502e144e69
```

Errors occur across multiple clusters, with frequency ~600 occurrences over 13 hours on a landscape with 87 PostgresClusters.

### Verification of Root Cause

**Pod ResourceVersion updates every ~10 seconds on active databases:**
```bash
$ kubectl get pod <pod> -n <namespace> --watch -o jsonpath='{.metadata.resourceVersion}{"\n"}'
682382578
682382926  # ~10s later
682383283  # ~10s later
```

**Patroni annotation changing frequently:**
```bash
$ kubectl get pod <pod> -o jsonpath='{.metadata.annotations.status}' | jq .xlog_location
331786904544  # Changes with every database write
```

**Labels and Patroni topology match (no actual impact on cluster health):**
```bash
$ kubectl get pod <pod> -o jsonpath='{.metadata.labels.postgres-operator\.crunchydata\.com/role}'
replica

$ kubectl exec <pod> -c database -- patronictl topology
| Member | Host | Role         | State     | TL | Lag in MB |
|--------|------|--------------|-----------|----|-----------| 
| inst-0 | ...  | Sync Standby | streaming | 26 |         0 |
```

## Proposed Solution

### Remove ResourceVersion from Delete Precondition
In `internal/controller/postgrescluster/instance.go` around line 850:
https://github.com/CrunchyData/postgres-operator/blob/main/internal/controller/postgrescluster/instance.go#L850

```go
// Current code causing conflicts:
return errors.WithStack(
    r.Writer.Delete(ctx, pod, client.Preconditions{
        UID:             &pod.UID,
        ResourceVersion: &pod.ResourceVersion,  // ← Remove this
    }))

// Proposed fix:
return errors.WithStack(
    r.Writer.Delete(ctx, pod, client.Preconditions{
        UID: &pod.UID,  // Keep UID check for safety
    }))
```

**Rationale:** The UID check is sufficient to ensure we're deleting the correct pod. The ResourceVersion check is overly strict for intentional deletion during rollout. Patroni's annotation updates don't affect the operator's intent to delete the pod.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResourceVersion conflicts block pod rollout due to Patroni annotation updates #4439

Overview

Environment

Steps to Reproduce

REPRO

EXPECTED

ACTUAL

Logs

Operator Error Logs

Verification of Root Cause

Proposed Solution

Remove ResourceVersion from Delete Precondition

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ResourceVersion conflicts block pod rollout due to Patroni annotation updates #4439

Description

Overview

Environment

Steps to Reproduce

REPRO

EXPECTED

ACTUAL

Logs

Operator Error Logs

Verification of Root Cause

Proposed Solution

Remove ResourceVersion from Delete Precondition

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions