Relying on undefined behavior - parsing kubernetes resource names #716

thockin · 2023-03-10T22:22:29Z

https://github.com/gocrane/crane/blob/main/pkg/utils/expression_prom_deafult_test.go#L18

The random suffix of a pod or other resource is not guaranteed to be 5 characters, nor is the characterset defined. Someone looked into the implementation details and took a dependency on them. This is brittle and could change at any time.

You should be using labels to extract metadata, not parsing the name. That's literally why they exist.

qmhu · 2023-03-13T14:09:52Z

Thank you for bringing this to our attention. We agree that relying on the random suffix of a pod or other resource can be brittle and may not be a reliable method for extracting metadata. We acknowledge the importance of using labels to extract metadata, as this is their intended purpose. Moving forward, we will explore ways to implement this approach by using labels to extract metadata.

saikey0379 · 2023-03-14T09:55:36Z

@thockin
Hello, I would like to ask a question.
We expect to predict based on the resource/pods/external type metrics.However,kubelet_ip:10250/metrics/resource only contain namespace/pod and container labels,while the predicted require to input metrics of all replicaset versions.Is there a better way for us to accurately match the resources such as deployment or daemonset name?

Similar to the pods and external type.The OwnerReferences of the pods for deployment are replicaset,I think it is unreasonable to query the associated deployment name from apiserver by the process inside pod,Currently, we set deployment name with custom labels in the container through environment variables manually, which is not perfect. Can you provide some relevant suggestions?

thockin · 2023-03-14T16:20:37Z

This is a good question - I am not an expert in the metrics but it seems like what you have is a feature request for more labels.

Are you trying to get this information from inside the pod itself (its own parent ref), or just somewhere in the monitoring/collection path?

Are you trying to extract the replicaset or the deployment, or both?

What if a pod is not run under a deployment or replicaset? E.g. a StatefulSet or Job or something custom?

saikey0379 · 2023-03-15T07:32:05Z

This is a good question - I am not an expert in the metrics but it seems like what you have is a feature request for more labels.

Are you trying to get this information from inside the pod itself (its own parent ref), or just somewhere in the monitoring/collection path?

Are you trying to extract the replicaset or the deployment, or both?

What if a pod is not run under a deployment or replicaset? E.g. a StatefulSet or Job or something custom?

Yes, this project operates based on three types of metrics, and considerations related to labels for resources and pods.

For resource type metrics, we expect to at least add ownerreference when obtaining pod status through the kubelet path, and preferably use the top-level reference such as deployment.

For pods type metrics, we expect to easily obtain this information within the pod (such as native rather than manually set environment variables) to facilitate the generation of custom metric labels.

For pods that are not under deployment or replicaset, such as statefulset and daemonset, the parent reference of the pod can also be queried. For independent pods, the name is specified at creation and precise matching is also straightforward. I think it's acceptable to set the reference to null when querying.

thockin · 2023-03-15T16:26:15Z

Sorry, I'm still not clear on what you are already doing (perhaps in ugly ways) and what you are asking for as net-new features.

For resource type metrics, we expect to at least add ownerreference when obtaining pod status through the kubelet path, and preferably use the top-level reference such as deployment.

Is there a feature request here? ownerReferences is in the API, but to climb to the "root" you need GET access on all the intermediate resources which you may not have.

For pods type metrics, we expect to easily obtain this information within the pod (such as native rather than manually set environment variables) to facilitate the generation of custom metric labels.

As far as I can tell, ownerReferences is not available via env valueFrom or the downward API. Both seem reasonable to me, if someone wants to write a small KEP (to debate the finer points of formatting and edge cases). It should be fairly straight-forward.

thockin · 2023-03-15T16:29:27Z

kubernetes/kubernetes#116662

saikey0379 · 2023-03-16T02:37:32Z

If a pod's random suffix is fragile and could change at any time, then we would have two new requirements:

The first requirement is to add owner or topowner labels to kubelet metrics such as container_cpu_usage_seconds_total.

The second requirement is to enable processes within a pod to easily obtain owner or topowner information, which can be used to fill in monitoring metric labels.

If the provided owner is a replicaset instead of a deployment, then when querying historical data with a Prometheus expression, we would need to use a query expression like {ownerReferences=~"test-111111|test-222222|test-3333333...}, which could become very ugly if the deployment version changes frequently. But an expression like {topOrKind="deployment",topOrName="test"} would be much simpler.

thockin · 2023-03-19T00:25:42Z

I'm afraid the issue is not quite so easy. There are a bunch of issues.

Deployments and ReplicaSets can orphan and adopt pods. Any additional information we provide can change over time. Is that acceptable? On Monday the pod "foobar-deadbeef93-xyz76" is owned by replicaset "foobar-deadbeef93" which is owned by deployment "foobar", but on tuesday it is owned by replicaset "hamburger-aabbccddee" and deployment "hamburger". The pod's name only means something at the instant it is created. And even THEN a long name can be truncated to add the random suffix, such that it doesnt actuyally match .*-.{10}-.{5}.

We could add well-known labels like workload.kubernetes.io/{deployment,replicaset,statuefulset,job,...} - this works for core k8s, but not for 3rd party workload controllers (which do exist!!). We'd have to verify that all of these APIs' object names are valid label values (hint: they are not). We could hash the deployment name if it is too long or use the UID. All that tells you is that 2 pods are in the same deployment, not what the deployment name actually is.

We could use annotations instead, but those are not selectable.

We could do a combination of labels AND annotations, I guess. Given the adoption issue, we would have to update those on the fly, meaning EVERY workload controller needs to handle it, and they might need more RBAC access. We can't just hash the selector, because an adoption could happen with a different-but-compatible selector.

Oh, but env vars are not updatable, so the downward API for this could only be through projected volumes.

thockin · 2024-01-14T21:45:32Z

Almost a year later this bug still exists. If kubernetes changes this behavior you will break.

ArangoGutierrez · 2024-07-08T11:03:45Z

kubernetes/enhancements#4753

thockin added the kind/bug Something isn't working label Mar 10, 2023

thockin mentioned this issue Mar 10, 2023

Make generateName use longer names when possible kubernetes/kubernetes#116430

Closed

thockin mentioned this issue Mar 15, 2023

Expose ownerReferences via valueFrom and downward API kubernetes/kubernetes#116662

Open

toVersus mentioned this issue Nov 6, 2023

ReplicaSet naming is inconsistent kubernetes/kubernetes#121687

Closed

thockin mentioned this issue Jan 14, 2024

apiserver does not retry generateName on collisions kubernetes/kubernetes#115489

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relying on undefined behavior - parsing kubernetes resource names #716

Relying on undefined behavior - parsing kubernetes resource names #716

thockin commented Mar 10, 2023

qmhu commented Mar 13, 2023

saikey0379 commented Mar 14, 2023

thockin commented Mar 14, 2023

saikey0379 commented Mar 15, 2023

thockin commented Mar 15, 2023

thockin commented Mar 15, 2023

saikey0379 commented Mar 16, 2023

thockin commented Mar 19, 2023

thockin commented Jan 14, 2024

ArangoGutierrez commented Jul 8, 2024

Relying on undefined behavior - parsing kubernetes resource names #716

Relying on undefined behavior - parsing kubernetes resource names #716

Comments

thockin commented Mar 10, 2023

qmhu commented Mar 13, 2023

saikey0379 commented Mar 14, 2023

thockin commented Mar 14, 2023

saikey0379 commented Mar 15, 2023

thockin commented Mar 15, 2023

thockin commented Mar 15, 2023

saikey0379 commented Mar 16, 2023

thockin commented Mar 19, 2023

thockin commented Jan 14, 2024

ArangoGutierrez commented Jul 8, 2024