Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tempo-query: add ReadinessProbe #1061

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

frzifus
Copy link
Collaborator

@frzifus frzifus commented Oct 15, 2024

@@ -0,0 +1,16 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this enhancement or bug_fix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view its a pure enhancement. Everything works as before. We only support the kubernetes readiness check.

component: tempostack

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add ReadinessProbe to tempo-query
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being added? What are the consequences of this patch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of showing that the pod is ready and handling the failure internally, the k8s api will show that its not ready. K8s will do regular calls to the gRPC endpoint to verify that its alive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and we should document #1061 (comment) in the changelog. Impact of adding readiness probe to our deployment.

I would also say that this is more a bug_fix than enhancement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. improves reliability at startup, avoiding lost data

@frzifus
Copy link
Collaborator Author

frzifus commented Oct 15, 2024

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

cc @pavolloffay

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

@pavolloffay
Copy link
Collaborator

cc @pavolloffay

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Yes, but does it fixes any particular issues in our deployments?

@frzifus
Copy link
Collaborator Author

frzifus commented Oct 15, 2024

y, during my multi tenancy debug session I noticed that we sometimes lost some measurements. The test did check for 10 reports but we only had 7-8. Adding a delay solved it. But having the readiness probe made it more resilient.

While typing, I think it would actually be a good Idea to add this kind of check to Jaeger-Query too. There is a extra health endpoint on localhost:14269/ for it.

Server Response

{"status":"Server available","upSince":"2024-10-15T08:44:13.514559835Z","uptime":"12.13965331s"}

@codecov-commenter
Copy link

codecov-commenter commented Oct 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.18%. Comparing base (21e91ab) to head (fc77c2b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1061      +/-   ##
==========================================
+ Coverage   69.14%   69.18%   +0.04%     
==========================================
  Files         110      110              
  Lines        7059     7069      +10     
==========================================
+ Hits         4881     4891      +10     
  Misses       1888     1888              
  Partials      290      290              
Flag Coverage Δ
unittests 69.18% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: Without a readiness check in place, there is a risk that data will be lost when the queryfrontend pod is ready but the tempo query API is not yet available.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can be data lost in query frontend? It does not ingest any data? It's used only for querying.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queryfrontend with enabled jaeger contains 3 containers in its pod. tempo-query is part of it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, the point I am making is that the

data will be lost when the queryfrontend pod

implies to me that this fixes data ingestion, which is not the case

Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
@frzifus frzifus enabled auto-merge (squash) October 16, 2024 11:24
@frzifus frzifus merged commit 7dcf94c into grafana:main Oct 16, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ReadinessProbe for tempo-query
4 participants