Skip to content

Commit

Permalink
Add ROSA Benchmark Key Results (#433)
Browse files Browse the repository at this point in the history
Closes #432
  • Loading branch information
ahus1 authored Jul 24, 2023
1 parent 534f2c0 commit d35e06b
Show file tree
Hide file tree
Showing 2 changed files with 210 additions and 0 deletions.
1 change: 1 addition & 0 deletions doc/benchmark/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
** xref:report/trend-report.adoc[]
** xref:report/diagram-types.adoc[]
** xref:report/result-summary.adoc[]
** xref:report/rosa-benchmark-key-results.adoc[]
* xref:scenario-overview.adoc[]
** xref:scenario/authorization-code.adoc[]
** xref:scenario/list-sessions.adoc[]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
= Keycloak on ROSA Benchmark Key Results

This summarizes a benchmark run with Keycloak 22 performed in July 2023.
Use this as a starting point to calculate the requirements of a Keycloak environment.
Use them to perform a load testing in your environment.

[WARNING]
====
CPU usage for refreshing a token is currently missing.
We hope to add this soon.
====

== Data collection

These are rough estimates from looking at Grafana dashboards.
A full automation is pending to show repeatable results over different releases.

== Setup

* OpenShift 4.13.x deployed on AWS via ROSA.
* Machinepool with `m5.4xlarge` instances.
* Keycloak 22 deployed with Operator and 3 pods.
* Default user password hashing with PBKDF2 27,500 hash iterations.
* Database seeded with 100,000 users and 100,000 clients.
* Infinispan caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database.
* All sessions in distributed caches as per default, with two owners per entries, allowing one failing pod without losing data.
* PostgreSQL deployed inside the same OpenShift with ephemeral storage.
+
Using a database with persistent storage will have longer database latencies, which might lead to longer response times; still, the throughput should be similar.

== Installation

Deploy OpenShift and ROSA as described in xref:kubernetes-guide::prerequisite/prerequisite-rosa.adoc[ROSA] and xref:kubernetes-guide::prerequisite/prerequisite-openshift.adoc[OpenShift] with

.OpenShift `.env` file
----
# no KC_CPU_LIMITS set for this scenario
KC_CPU_REQUESTS=6
KC_INSTANCES=3
KC_DISABLE_STICKY_SESSION=true
KC_MEMORY_REQUESTS_MB=4000
KC_MEMORY_LIMITS_MB=4000
KC_HEAP_MAX_MB=2048
KC_DB_POOL_INITIAL_SIZE=30
KC_DB_POOL_MAX_SIZE=30
KC_DB_POOL_MIN_SIZE=30
----

== Performance results

[WARNING]
====
* Performance will be lowered when scaling to more Pods (due to additional overhead) and using a cross-datacenter setup (due to additional traffic and operations).
* Increased cache sizes can improve the performance when Keycloak instances run for a longer time. Still, those caches need to be filled when an instance is restarted.
* Use these values as a starting point and perform your own load tests before going into production.
====

Summary:

* The used CPU scales linearly with the number of requests up to the tested limit below.
* The used memory scales linearly with the number of active sessions up to the tested limit below.

Observations:

* The base memory usage for an inactive Pod is 1 GB of RAM.

* Leave 1 GB extra head-room for spikes of RAM.

* For each 100,000 active user sessions, add 500 MB per Pod in a three-node cluster (tested with up to 200,000 sessions).
+
This assumes that each user connects to only one client.
Memory requirements increase with the number of client sessions per user session (not tested yet).

* For each 45 user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second).
+
Keycloak spends most of the CPU time hashing the password provided by the user.

* For each 250 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second).
+
Most CPU time goes into creating new TLS connections, as each client runs only a single request.

* Leave 100% extra head-room for CPU usage to handle spikes in the load.
Performance of Keycloak dropped significantly when its Pods were throttled in our tests.

=== Calculation example

Target size:

* 50,000 active user sessions
* 45 logins per seconds
* 250 client credential grants per second

Limits calculated:

* CPU requested: 2 vCPU
+
(45 logins per second = 1 vCPU, 250 client credential grants per second = 1 vCPU)

* CPU limit: 4 vCPU
+
(doubling the CPU requested to handle peaks, and also refresh token handling which we don't have numbers on, yet)

* Memory requested: 1.2 GB
+
(1 GB base memory plus 200 MB RAM for 50,000 active sessions)

* Memory limit: 2.2 GB
+
(adding 1 GB to the memory requested)

== Tests performed

Each test ran for 10 minutes.

. Setup ROSA cluster as default.
. Scale machine pool.
+
[source,bash,subs="+quotes"]
----
rosa edit machinepool -c **<clustername>** --min-replicas 3 --max-replicas 10 scaling
----
. Deploy Keycloak and Monitoring
+
[source,bash]
----
cd provision/openshift
task
task monitoring
----
. Create dataset
+
[source,bash]
----
task dataset-import -- -a create-realms -u 100000
# wait for first task to complete
task dataset-import -- -a create-clients -c 100000 -n realm-0
----
. Prepare environment for running the benchmark via Ansible
+
See xref:run/running-benchmark-ansible.adoc[] for details.
+
.Contents of `env.yml` used here
[source,yaml]
----
cluster_size: 5
instance_type: t3.small
instance_volume_size: 30
kcb_zip: ../benchmark/target/keycloak-benchmark-0.10-SNAPSHOT.zip
kcb_heap_size: 1G
----

. Create load runners
+
[source,bash,subs="+quotes"]
----
cd ../../ansible
./aws_ec2.sh start **<region of ROSA cluster>**
----
. Run different load tests

* Testing memory for creating sessions
+
[source,bash,subs="+quotes"]
----
./benchmark.sh eu-west-1 \
--scenario=keycloak.scenario.authentication.AuthorizationCode \
--server-url=${KEYCLOAK_URL} \
--realm-name=realm-0 \
--users-per-sec=**<number of users per second>** \
--ramp-up=20 \
--logout-percentage=0 \
--measurement=600 \
--users-per-realm=100000 \
--log-http-on-failure
----

* Testing CPU usage for user logins
+
[source,bash,subs="+quotes"]
----
./benchmark.sh eu-west-1 \
--scenario=keycloak.scenario.authentication.AuthorizationCode \
--server-url=${KEYCLOAK_URL} \
--realm-name=realm-0 \
--users-per-sec=**<number of users per second>** \
--ramp-up=20 \
--logout-percentage=100 \
--measurement=600 \
--users-per-realm=100000 \
--log-http-on-failure
----

* Testing CPU usage for client credential grants
+
[source,bash,subs="+quotes"]
----
./benchmark.sh eu-west-1 \
--scenario=keycloak.scenario.authentication.AuthorizationCode \
--server-url=${KEYCLOAK_URL} \
--realm-name=realm-0 \
--users-per-sec=**<number of clients per second>** \
--ramp-up=20 \
--logout-percentage=100 \
--measurement=600 \
--users-per-realm=100000 \
--log-http-on-failure
----

0 comments on commit d35e06b

Please sign in to comment.