Etcd Upgrade Fails When Persistent Storage Flag is Disabled #78398

saurabhnetskope · 2025-02-27T15:16:31Z

Name and Version

bitnami/etcd:3.5.18

What architecture are you using?

amd64

What steps will reproduce the bug?

Description:
We are running a 3-pod etcd cluster without persistent storage, relying on emptyDir. However, after the recent change introduced in commit 1aff4e2, the etcd upgrade is failing.

Steps to Reproduce:

Deploy a 3-pod etcd cluster with persistent storage disabled (emptyDir used instead).
Attempt to perform a rolling upgrade.
Observe that the upgrade fails.

Root Cause:
The issue lies in the is_new_etcd_cluster function. To determine if the cluster is new or existing, this function executes:

is_new_etcd_cluster() {
    local -a extra_flags
    read -r -a extra_flags <<<"$(etcdctl_auth_flags)"
    is_boolean_yes "$ETCD_ON_K8S" && extra_flags+=("--endpoints=$(etcdctl_get_endpoints)")
    ! debug_execute etcdctl endpoint status --cluster "${extra_flags[@]}"
}

During a rolling upgrade, not all endpoints will be responsive. When etcdctl_get_endpoints includes its own endpoint, the command:

! debug_execute etcdctl endpoint status --cluster "${extra_flags[@]}"

fails, leading to the upgrade issue.

Proposed Fix:
Modify etcdctl_get_endpoints to exclude the pod's own endpoint before executing etcdctl endpoint status --cluster. This will prevent failures when checking the cluster status during a rolling upgrade.

Expected Behavior:
The etcd cluster should successfully upgrade even when persistent storage is disabled, allowing rolling upgrades to complete without failure.

Environment Details:

Etcd Cluster: 3 pods
Storage: emptyDir
Affected Commit: 1aff4e2
Kubernetes Environment: [Provide Kubernetes version]
Helm Chart Version (if applicable): [Provide chart version]

What is the expected behavior?

The etcd cluster should successfully upgrade even when persistent storage is disabled, allowing rolling upgrades to complete without failure.

What do you see instead?

Prevents successful rolling upgrades for etcd clusters without persistent storage.
Could impact production environments relying on emptyDir for ephemeral etcd storage.

The text was updated successfully, but these errors were encountered:

saurabhnetskope · 2025-02-27T15:28:25Z

this is fixed function.

is_new_etcd_cluster() {
    local -a extra_flags
    read -r -a extra_flags <<< "$(etcdctl_auth_flags)"
    
    is_boolean_yes "$ETCD_ON_K8S" && extra_flags+=("--endpoints=$(etcdctl_get_endpoints true)")
    
    ! debug_execute etcdctl endpoint status --cluster "${extra_flags[@]}"
}

carrodher · 2025-02-27T20:00:25Z

Thank you for bringing this issue to our attention. We appreciate your involvement! If you're interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

saurabhnetskope added the tech-issues The user has a technical issue about an application label Feb 27, 2025

github-actions bot added the triage Triage is needed label Feb 27, 2025

github-actions bot assigned carrodher Feb 27, 2025

saurabhnetskope mentioned this issue Feb 27, 2025

[bitnami/etcd] Stop relying on files for state #75906

Merged

carrodher added the etcd label Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Etcd Upgrade Fails When Persistent Storage Flag is Disabled #78398

Etcd Upgrade Fails When Persistent Storage Flag is Disabled #78398

saurabhnetskope commented Feb 27, 2025 •

edited by carrodher

Loading

saurabhnetskope commented Feb 27, 2025 •

edited by carrodher

Loading

carrodher commented Feb 27, 2025

Etcd Upgrade Fails When Persistent Storage Flag is Disabled #78398

Etcd Upgrade Fails When Persistent Storage Flag is Disabled #78398

Comments

saurabhnetskope commented Feb 27, 2025 • edited by carrodher Loading

Name and Version

What architecture are you using?

What steps will reproduce the bug?

What is the expected behavior?

What do you see instead?

saurabhnetskope commented Feb 27, 2025 • edited by carrodher Loading

carrodher commented Feb 27, 2025

saurabhnetskope commented Feb 27, 2025 •

edited by carrodher

Loading

saurabhnetskope commented Feb 27, 2025 •

edited by carrodher

Loading