Skip to content

Commit

Permalink
add Keycloak SLO docs fixes #579
Browse files Browse the repository at this point in the history
Signed-off-by: Kamesh Akella <kamesh.asp@gmail.com>
  • Loading branch information
kami619 committed Oct 22, 2024
1 parent 6eb3efd commit 1fac44d
Show file tree
Hide file tree
Showing 4 changed files with 40 additions and 2 deletions.
3 changes: 2 additions & 1 deletion doc/kubernetes/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,10 @@ include::partial$subnav-openshift.adoc[]
* xref:testing/index.adoc[]
* xref:running/index.adoc[]
** xref:running/infinispan-deployment.adoc[]
** xref:running/timeout_tunning.adoc[]
** xref:running/timeout_tuning.adoc[]
** xref:running/jvm/jvm_options.adoc[]
** Metrics
*** xref:running/metrics/keycloak_service_level_indicators.adoc[]
*** xref:running/metrics/jvm_metrics.adoc[]
*** xref:running/metrics/keycloak_cluster.adoc[]
*** xref:running/metrics/keycloak_with_external_infinispan.adoc[]
Expand Down
3 changes: 2 additions & 1 deletion doc/kubernetes/modules/ROOT/pages/running/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ These guides will eventually be published Keycloak's main web page.
== Building blocks

* xref:running/infinispan-deployment.adoc[]
* xref:running/timeout_tunning.adoc[]
* xref:running/timeout_tuning.adoc[]
[#jvm-tuning]
== JVM tuning guides
Expand All @@ -26,6 +26,7 @@ These guides will eventually be published Keycloak's main web page.
[#monitoring-deployments]
== Monitoring deployments

* xref:running/metrics/keycloak_service_level_indicators.adoc[]
* xref:running/metrics/jvm_metrics.adoc[]
* xref:running/metrics/keycloak_cluster.adoc[]
* xref:running/metrics/keycloak_with_external_infinispan.adoc[]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
= {project_name} Service Level Indicators
:description: This document contains details of the SLI's to monitor your {project_name} deployment's performance.

To ensure that {project_name} can be confidently run in a production environment, it is important for customers to have an overview of key metrics from both {project_name} and {jdgserver_name}. This will allow them to assess the health and performance of their system, ensuring smooth operation. Additionally, these metrics will provide critical insight for anyone supporting the deployment, allowing them to request and analyze the necessary data effectively.

We assume that the scenario for defining the SLO's and SLI's is based on the below steps.

====
As a {project_name} user,
* I want to be able to log in.
* refresh my token.
* access and use the admin console.
* and manage my profile through the account console.
So that I can interact with the {project_name} system effectively and perform the necessary tasks without interruption.
====


[cols="1,1,1,1,2,2,2", options="header"]
|===
| SLO | SLO definition | Single Site SLO Target | Multi-Site SLO Target | SLI Metric | Metric Details | Dashboard

| Availability | {project_name} should be available XX.XX% of the time. | 99.9% | 99.99% | Uptime percentage is the ratio of successful authentication requests to total authentication requests. | https://github.com/keycloak/keycloak/blob/main/docs/guides/high-availability/health-checks-multi-site.adoc[Health checks],
and the `up` metric which indicates if the Prometheus server is able to scrape metrics from the {project_name} instance. This metric will have a value of 1 if the {project_name} service is available and responding to Prometheus scrape requests, and 0 if the service is down or unreachable.

| NA

| Authentication Latency | XX% of {project_name} authentication requests should have a latency below 200ms. | 99% | 95% | {project_name} server-side metrics to track latency for specific endpoints along with Response Time Distribution. | `http_server_requests_seconds_count`, `http_server_requests_seconds_sum`.

https://www.keycloak.org/keycloak-benchmark/kubernetes-guide/latest/running/metrics/keycloak_cluster#processing-time[More details about the metrics are captured here.] | https://github.com/keycloak/keycloak-benchmark/blob/main/provision/minikube/monitoring/dashboards/authentication-code.json[Example Grafana dashboard]

| Error Rate “during login” | The error rate should be less than X.X%. | 0.1% | 0.05% | The ratio of failed authentication requests to total requests. | Failed requests could be identified by the `5xx error codes` generated by the {project_name} server and those could be further per URL.
|https://grafana.com/grafana/dashboards/10441-keycloak-metrics-dashboard/[Example Grafana dashboard]
|===

0 comments on commit 1fac44d

Please sign in to comment.