-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Keycloak SLO docs fixes #579 #1020
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR. It captures the discussions we had internally, still it would need to change its shape a bit to be useful as a future guide in Keycloak main.
- The introduction needs to briefly explain SLI/SLO to our best knowledge
- It should be more specific about which metrics should be used, as readers would otherwise be lost in the metrics and their tags.
- It needs to be less specific about actual SLOs, as those would require a lot more assumptions to be explained and calculations to be made
See below for the comment.
doc/kubernetes/modules/ROOT/pages/running/metrics/keycloak_service_level_indicators.adoc
Outdated
Show resolved
Hide resolved
doc/kubernetes/modules/ROOT/pages/running/metrics/keycloak_service_level_indicators.adoc
Outdated
Show resolved
Hide resolved
| Authentication Latency | XX% of {project_name} authentication requests should have a latency below 200ms. | 99% | 95% | {project_name} server-side metrics to track latency for specific endpoints along with Response Time Distribution. | `http_server_requests_seconds_count`, `http_server_requests_seconds_sum`. | ||
|
||
https://www.keycloak.org/keycloak-benchmark/kubernetes-guide/latest/running/metrics/keycloak_cluster#processing-time[More details about the metrics are captured here.] | https://github.com/keycloak/keycloak-benchmark/blob/main/provision/minikube/monitoring/dashboards/authentication-code.json[Example Grafana dashboard] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to place this in the main Keycloak docs eventually, and it should be self-sufficient.
It is missing the URL path for URLs related to login, and the information that histograms for HTTP metrics need to be enabled first.
IMHO the heatmap you point to will not show the SLI. IMHO the diagram on the left will show the SLI/SLO
AFAIK the expression could be simplified by using all URLs starting with /realms/{realm}/protocol/{protocol}/.*
and not restricting them to (token|auth|logout)
.
sum(irate(http_server_requests_seconds_bucket{uri=~"/realms/{realm}/protocol/{protocol}/(token|auth|logout)|/realms/{realm}/login-actions/authenticate", le="0.25", namespace="$namespace", pod="$pod_name"}[2m])) without (le) /
irate(http_server_requests_seconds_count{uri=~"/realms/{realm}/protocol/{protocol}/(token|auth|logout)|/realms/{realm}/login-actions/authenticate", namespace="$namespace", pod="$pod_name"}[2m])
| Error Rate “during login” | The error rate should be less than X.X%. | 0.1% | 0.05% | The ratio of failed authentication requests to total requests. | Failed requests could be identified by the `5xx error codes` generated by the {project_name} server and those could be further per URL. | ||
|https://grafana.com/grafana/dashboards/10441-keycloak-metrics-dashboard/[Example Grafana dashboard] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I'd like it to be more specific and show which URLs are relevant during login. This could be done with a PromQL query.
doc/kubernetes/modules/ROOT/pages/running/metrics/keycloak_service_level_indicators.adoc
Outdated
Show resolved
Hide resolved
doc/kubernetes/modules/ROOT/pages/running/metrics/keycloak_service_level_indicators.adoc
Outdated
Show resolved
Hide resolved
15d11f1
to
6550c1c
Compare
@ahus1 I have addressed most of the review feedback, and I still have some doubts on how to implement your input on the PromQL queries. I will connect with you offline and work on it. |
0cc221e
to
fc29e31
Compare
@ ahus1 I added what we thought would be good concerning the structure of the doc. If you can double-check the error rate related PromQL queries, that would be great. |
Signed-off-by: Kamesh Akella <kamesh.asp@gmail.com>
fc29e31
to
e4a413f
Compare
Fixes #579