Fix service discovery bug in `kubernetes-extensions` by capistrant · Pull Request #19139 · apache/druid

capistrant · 2026-03-11T22:14:31Z

Description

Bug Report

The k8s service discovery is not removing discovered nodes whose pods still exist with service announcement labels, but the underlying services are actually unhealthy.

For example, if a broker container is killed but the pod that manages it remains in the namespace with announcement labels, all druid services will maintain this service in their discovered services cache. This leads to queries being routed to a broker that cannot possibly execute the request. If this pod remains in an announced but unhealthy state for any meaningful period of time, the cluster functionality can be severely compromised.

Desired behavior in the above example would be that the broker is removed from discovered services caches, at least until the underlying container for the pod is restarted and the pod is healthy again.

Fix Details

My proposed fix starts using a pods readiness flag in the discovery logic. If a pod is not ready, the underlying services will not be added to service discovery caches they are not in and will be removed from any caches that they were in. These services can be added back once they have a MODIFIED or ADDED event in addition to being ready again.

Fix Risks

The biggest risk I see is that this new reliance on readiness probe introduces an expectation that this probe is accurate and stable. I try to call out in documentation that this needs to be considered when defining the readiness probe for a pod as a way to mitigate unexpected changes for users. This could be included in a release note as well to tip off any users of the extension.

Alternatives

Using Pod Phase is another option for this. This feels like a more permissive check than readiness in that it may be more lax with removing a service from discovered hosts than readiness.

The idea would be to treat Running and Unknown pod phase states as discoverable. Unknown is there to avoid k8s control plane communication issues / transient problems from removing hosts from discovered lists. One downside here based on k8s documentation (linked above) is that a pod will be in running phase if a container is in the process of starting or restarting. This technically means that services who are not actually ready to be in the cluster could still be discovered if they have the labels and are in a transient state trying to start. The fear here being that a service gets into a restart loop without changing phase, so it remains discoverable but is not going to become healthy in the near future (or ever).

Release note

TBD

Key changed/added classes in this PR

DefaultK8sApiClient
BaseNodeRoleWatcher
WatchResult

This PR has:

…very

capistrant · 2026-03-11T22:32:37Z

marking this as draft while I evaluate a competing approach that uses pod phase instead of readiness

capistrant · 2026-03-18T16:04:02Z

marking this as draft while I evaluate a competing approach that uses pod phase instead of readiness

I think that I prefer readiness to pod phase.

being ready means that you are ready to announce yourself to the cluster. If a readiness probe is properly configured using our recommendations, if the probe fails enough to cross threshold of being not ready, the service is no longer ready to be announced and should stop getting traffic.

also in protecting against the worst case of a pod having announce labels but the container having been killed without proper shutdown + not being rescheduled, readiness will properly handle undiscovering such a service nearly immediately upon crash.

gianm · 2026-03-18T17:22:53Z

.../kubernetes-extensions/src/main/java/org/apache/druid/k8s/discovery/DefaultK8sApiClient.java

+                      effectiveType = WatchResult.NOT_READY;
+                    } else if (WatchResult.ADDED.equals(item.type)) {
+                      // Pod is not ready yet (e.g., still starting up). Skip this event entirely.
+                      // It will appear via a MODIFIED event that remaps to ADDED for dicovery, once it becomes ready.


discovery (spelling)

gianm · 2026-03-18T17:26:47Z

...ons-core/kubernetes-extensions/src/main/java/org/apache/druid/k8s/discovery/WatchResult.java

@@ -26,9 +26,18 @@
 public interface WatchResult


This class used to be a relatively thin layer over k8s watches, now there's remapping and synthetic events going on. The javadoc of this class should make clear that it's not meant to be a thin layer over k8s watches, it's meant to be aligned with the needs of service discovery.

gianm · 2026-03-18T17:46:43Z

...s-extensions/src/main/java/org/apache/druid/k8s/discovery/K8sDruidNodeDiscoveryProvider.java

@@ -273,6 +273,11 @@ private void keepWatching(String labelSelector, String resourceVersion)
                  case WatchResult.DELETED:
                    baseNodeRoleWatcher.childRemoved(item.object.getNode());


Can DELETED fire after NOT_READY (or even before ADDED, if a pod is deleted before it ever becomes ready)? I think the logging in those cases will end up noisy, since childRemoved complains loudly if the service doesn't exist. You added a skipIfUnknown that is used when NOT_READY fires, but I wonder if now we should always act like that.

yes, these could certainly happen. thank you for calling it out

capistrant added 8 commits March 11, 2026 16:16

Potential fix

dbe34a8

self review iteration

e539452

minor fixes

cb71279

clarify why remapping modified to added is safe

fcce511

Add more info about how readiness probes come into play for k8s disco…

474c33f

…very

more comment and log cleanup

23af381

Fix doc spelling

1340130

refactor to make reasoning about new behavior more easy

23f557e

capistrant added the Bug label Mar 11, 2026

github-actions bot added the Area - Documentation label Mar 11, 2026

capistrant added the Kubernetes label Mar 11, 2026

capistrant marked this pull request as draft March 11, 2026 22:32

capistrant added 7 commits March 13, 2026 13:32

Merge branch 'master' into k8s-discovery-bug

27cefc1

Merge branch 'master' into k8s-discovery-bug

f8f057b

change example in docs to new readiness probe

106fe38

Fix tests file

5ca2069

Merge branch 'master' into k8s-discovery-bug

d2f7c1b

Some doc sharpening

0a9b72f

add some common k8s language to spelling

4e1db73

capistrant marked this pull request as ready for review March 17, 2026 22:31

capistrant added the Release Notes label Mar 17, 2026

capistrant added this to the 37.0.0 milestone Mar 17, 2026

gianm reviewed Mar 18, 2026

View reviewed changes

Address review comments on log noiseness and javadocs

818bffc

gianm approved these changes Mar 18, 2026

View reviewed changes

gianm merged commit 15e98ba into apache:master Mar 18, 2026
37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix service discovery bug in `kubernetes-extensions`#19139

Fix service discovery bug in `kubernetes-extensions`#19139
gianm merged 16 commits intoapache:masterfrom
capistrant:k8s-discovery-bug

capistrant commented Mar 11, 2026 •

edited

Loading

Uh oh!

capistrant commented Mar 11, 2026

Uh oh!

capistrant commented Mar 18, 2026

Uh oh!

gianm Mar 18, 2026

Uh oh!

gianm Mar 18, 2026

Uh oh!

gianm Mar 18, 2026

Uh oh!

capistrant Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -273,6 +273,11 @@ private void keepWatching(String labelSelector, String resourceVersion)
		case WatchResult.DELETED:
		baseNodeRoleWatcher.childRemoved(item.object.getNode());

Conversation

capistrant commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Bug Report

Fix Details

Fix Risks

Alternatives

Release note

Key changed/added classes in this PR

Uh oh!

capistrant commented Mar 11, 2026

Uh oh!

capistrant commented Mar 18, 2026

Uh oh!

gianm Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

gianm Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

gianm Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

capistrant Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

capistrant commented Mar 11, 2026 •

edited

Loading