Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReplicaSet naming is inconsistent #121687

Closed
majanjua-amzn opened this issue Nov 1, 2023 · 12 comments
Closed

ReplicaSet naming is inconsistent #121687

majanjua-amzn opened this issue Nov 1, 2023 · 12 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@majanjua-amzn
Copy link

majanjua-amzn commented Nov 1, 2023

What happened?

When kubernetes is creating a ReplicaSet automatically through a deployment, the suffix added to the deployment name has an inconsistent length. Although this is not necessarily a bug, this can lead to issues in trying to identify ReplicaSets that were created through deployments using, for instance, RegEx.

This code is responsible for creating a suffix string to add to the deployment name to create the ReplicaSet name. The number created by podTemplateSpecHasher.Sum32() is any number from 0 to 4,294,967,295. This means the suffix string after the number goes through rand.SafeEncodeString (link) will have a length of 0 to 10.

What did you expect to happen?

ReplicaSet names should always have a suffix of consistent length when created automatically via a deployment.

How can we reproduce it (as minimally and precisely as possible)?

Create a number of deployments that will in turn create ReplicaSets. On average, 23% of these ReplicaSets will have suffixes of length 9 or shorter instead of the expected 10.

Anything else we need to know?

An easy solution would be to simply change the code as follows:

return rand.SafeEncodeString(fmt.Sprintf("%0*d", 10, podTemplateSpecHasher.Sum32()))

There is no risk to a solution like this as it will only go into effect when new ReplicaSet revisions are created and will only make ReplicaSet names more consistent. Any names that would be generated out of a number like 1,234,567 will instead be generated by 0,001,234,567, i.e. there is the same number of possible suffixes as before.

Kubernetes version

Doesn't really matter but:

$ kubectl version
Client Version: v1.28.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Unable to connect to the server: dial tcp: lookup ...: no such host

Cloud provider

Not sure what this means.

@majanjua-amzn majanjua-amzn added the kind/bug Categorizes issue or PR as related to a bug. label Nov 1, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 1, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@majanjua-amzn
Copy link
Author

/sig cluster-lifecycle

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Nov 1, 2023
@kannon92
Copy link
Contributor

kannon92 commented Nov 2, 2023

/remove-sig cluster-lifecycle
/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Nov 2, 2023
@sftim
Copy link
Contributor

sftim commented Nov 2, 2023

this can lead to issues in trying to identify ReplicaSets that were created through deployments using, for instance, RegEx

I'm not sure that's a bug. A better way is to match via labels or via ownerReferences.

/priority awaiting-more-evidence

Also:
/remove-sig cluster-lifecycle

@k8s-ci-robot k8s-ci-robot added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Nov 2, 2023
@k8s-ci-robot
Copy link
Contributor

@sftim: Those labels are not set on the issue: sig/cluster-lifecycle

In response to this:

this can lead to issues in trying to identify ReplicaSets that were created through deployments using, for instance, RegEx

I'm not sure that's a bug. A better way is to match via labels or via ownerReferences.

/priority awaiting-more-evidence

Also:
/remove-sig cluster-lifecycle

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@majanjua-amzn
Copy link
Author

majanjua-amzn commented Nov 2, 2023

I agree, but there's no reason not to add this small quality of life improvement to help identify resources. In large scale systems, looking through all pods and replicasets for their owners may put a unwanted load on the EKS control plane when using the name directly may be a more reasonable solution.

In the end, the proposed change simply makes this option (using names to identify owners) more reliable and consistent.

Additionally, there is a low chance that there is a very low number of characters in the suffix which should be considered a bug as the resource names should be obviously "computer generated", i.e. obvious that they are dependents of other resources. For example, there is a 10/4294967295 chance of having a suffix with a single character. This happens if the hash generates number 0-9.

@majanjua-amzn
Copy link
Author

Proof:

  • Golang documentation for Hash shows it generates a uint32: link
  • Code shows the following: link
// uint32 is the set of all unsigned 32-bit integers.
// Range: 0 through 4294967295.
type uint32 uint32

@toVersus
Copy link
Contributor

toVersus commented Nov 6, 2023

It can be improved, but using regular expressions to identify ReplicaSets may not be the good approach. This is because the number of random characters is not guaranteed and could change in the future. For example, in the case of Pods, there was an attempt to increase the length of random characters from 5 to 10. In the end, there were several projects relying on parsing random strings, and the PR is currently on hold due to a lack of a good alternative (#116430, gocrane/crane#716).

cc: @thockin

@thockin
Copy link
Member

thockin commented Nov 6, 2023

A few thoughts:

  1. If, by chance, the hash ends up as 0 and we produce "" rather than "0", then that is a bug and should be fixed.

  2. Regex parsing pod names to extract things like RS or deployment names IS NOT SUPPORTED. We paused an effort to actively change this, because we don't want to break users, but it's already broken for very long names. If it works, it works by luck. We should not endorse this pattern.

  3. If doing this "right" represents a real hardship for consumers, we should think up a better pattern that is more supportable. Please bring evidence.

  4. I am not against the idea of leading 0s, per se, but it has non-zero risk of breaking people who already parse this, and it sort of encourages this anti-best-practice of pattern matching. So I don't think it's worthwhile.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 4, 2024
@thockin
Copy link
Member

thockin commented Feb 6, 2024

Hey sig-apps people,

I think we should close this, but maybe you think it's important to ensure that the RS suffix is always 10 chars?

@janetkuo @kow3ns @soltysh

@janetkuo
Copy link
Member

janetkuo commented Feb 6, 2024

I think we can close this one. As folks mentioned above we don't support or encourage parsing names, and I don't see evidence that RS suffix length could cause issues.

@janetkuo janetkuo closed this as completed Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
Archived in project
Development

No branches or pull requests

8 participants