-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Closes #432 Co-authored-by: Michal Hajas <mhajas@redhat.com>
- Loading branch information
Showing
2 changed files
with
210 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
209 changes: 209 additions & 0 deletions
209
doc/benchmark/modules/ROOT/pages/report/rosa-benchmark-key-results.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,209 @@ | ||
= Keycloak on ROSA Benchmark Key Results | ||
|
||
This summarizes a benchmark run with Keycloak 22 performed in July 2023. | ||
Use this as a starting point to calculate the requirements of a Keycloak environment. | ||
Use them to perform a load testing in your environment. | ||
|
||
[WARNING] | ||
==== | ||
CPU usage for refreshing a token is currently missing. | ||
We hope to add this soon. | ||
==== | ||
|
||
== Data collection | ||
|
||
These are rough estimates from looking at Grafana dashboards. | ||
A full automation is pending to show repeatable results over different releases. | ||
|
||
== Setup | ||
|
||
* OpenShift 4.13.x deployed on AWS via ROSA. | ||
* Machinepool with `m5.4xlarge` instances. | ||
* Keycloak 22 deployed with Operator and 3 pods. | ||
* Default user password hashing with PBKDF2 27,500 hash iterations. | ||
* Database seeded with 100,000 users and 100,000 clients. | ||
* Infinispan caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database. | ||
* All sessions in distributed caches as per default, with two owners per entries, allowing one failing pod without losing data. | ||
* PostgreSQL deployed inside the same OpenShift with ephemeral storage. | ||
+ | ||
Using a database with persistent storage will have longer database latencies, which might lead to longer response times; still, the throughput should be similar. | ||
|
||
== Installation | ||
|
||
Deploy OpenShift and ROSA as described in xref:kubernetes-guide::prerequisite/prerequisite-rosa.adoc[ROSA] and xref:kubernetes-guide::prerequisite/prerequisite-openshift.adoc[OpenShift] with | ||
|
||
.OpenShift `.env` file | ||
---- | ||
# no KC_CPU_LIMITS set for this scenario | ||
KC_CPU_REQUESTS=6 | ||
KC_INSTANCES=3 | ||
KC_DISABLE_STICKY_SESSION=true | ||
KC_MEMORY_REQUESTS_MB=4000 | ||
KC_MEMORY_LIMITS_MB=4000 | ||
KC_HEAP_MAX_MB=2048 | ||
KC_DB_POOL_INITIAL_SIZE=30 | ||
KC_DB_POOL_MAX_SIZE=30 | ||
KC_DB_POOL_MIN_SIZE=30 | ||
---- | ||
|
||
== Performance results | ||
|
||
[WARNING] | ||
==== | ||
* Performance will be lowered when scaling to more Pods (due to additional overhead) and using a cross-datacenter setup (due to additional traffic and operations). | ||
* Increased cache sizes can improve the performance when Keycloak instances run for a longer time. Still, those caches need to be filled when an instance is restarted. | ||
* Use these values as a starting point and perform your own load tests before going into production. | ||
==== | ||
|
||
Summary: | ||
|
||
* The used CPU scales linearly with the number of requests up to the tested limit below. | ||
* The used memory scales linearly with the number of active sessions up to the tested limit below. | ||
|
||
Observations: | ||
|
||
* The base memory usage for an inactive Pod is 1 GB of RAM. | ||
|
||
* Leave 1 GB extra head-room for spikes of RAM. | ||
|
||
* For each 100,000 active user sessions, add 500 MB per Pod in a three-node cluster (tested with up to 200,000 sessions). | ||
+ | ||
This assumes that each user connects to only one client. | ||
Memory requirements increase with the number of client sessions per user session (not tested yet). | ||
|
||
* For each 45 user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second). | ||
+ | ||
Keycloak spends most of the CPU time hashing the password provided by the user. | ||
|
||
* For each 250 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second). | ||
+ | ||
Most CPU time goes into creating new TLS connections, as each client runs only a single request. | ||
|
||
* Leave 100% extra head-room for CPU usage to handle spikes in the load. | ||
Performance of Keycloak dropped significantly when its Pods were throttled in our tests. | ||
|
||
=== Calculation example | ||
|
||
Target size: | ||
|
||
* 50,000 active user sessions | ||
* 45 logins per seconds | ||
* 250 client credential grants per second | ||
|
||
Limits calculated: | ||
|
||
* CPU requested: 2 vCPU | ||
+ | ||
(45 logins per second = 1 vCPU, 250 client credential grants per second = 1 vCPU) | ||
|
||
* CPU limit: 4 vCPU | ||
+ | ||
(doubling the CPU requested to handle peaks, and also refresh token handling which we don't have numbers on, yet) | ||
|
||
* Memory requested: 1.2 GB | ||
+ | ||
(1 GB base memory plus 200 MB RAM for 50,000 active sessions) | ||
|
||
* Memory limit: 2.2 GB | ||
+ | ||
(adding 1 GB to the memory requested) | ||
|
||
== Tests performed | ||
|
||
Each test ran for 10 minutes. | ||
|
||
. Setup ROSA cluster as default. | ||
. Scale machine pool. | ||
+ | ||
[source,bash,subs="+quotes"] | ||
---- | ||
rosa edit machinepool -c **<clustername>** --min-replicas 3 --max-replicas 10 scaling | ||
---- | ||
. Deploy Keycloak and Monitoring | ||
+ | ||
[source,bash] | ||
---- | ||
cd provision/openshift | ||
task | ||
task monitoring | ||
---- | ||
. Create dataset | ||
+ | ||
[source,bash] | ||
---- | ||
task dataset-import -- -a create-realms -u 100000 | ||
# wait for first task to complete | ||
task dataset-import -- -a create-clients -c 100000 -n realm-0 | ||
---- | ||
. Prepare environment for running the benchmark via Ansible | ||
+ | ||
See xref:run/running-benchmark-ansible.adoc[] for details. | ||
+ | ||
.Contents of `env.yml` used here | ||
[source,yaml] | ||
---- | ||
cluster_size: 5 | ||
instance_type: t3.small | ||
instance_volume_size: 30 | ||
kcb_zip: ../benchmark/target/keycloak-benchmark-0.10-SNAPSHOT.zip | ||
kcb_heap_size: 1G | ||
---- | ||
|
||
. Create load runners | ||
+ | ||
[source,bash,subs="+quotes"] | ||
---- | ||
cd ../../ansible | ||
./aws_ec2.sh start **<<region of ROSA cluster>>** | ||
---- | ||
. Run different load tests | ||
|
||
* Testing memory for creating sessions | ||
+ | ||
[source,bash,subs="+quotes"] | ||
---- | ||
./benchmark.sh eu-west-1 \ | ||
--scenario=keycloak.scenario.authentication.AuthorizationCode \ | ||
--server-url=${KEYCLOAK_URL} \ | ||
--realm-name=realm-0 \ | ||
--users-per-sec=**<number of users per second>** \ | ||
--ramp-up=20 \ | ||
--logout-percentage=0 \ | ||
--measurement=600 \ | ||
--users-per-realm=100000 \ | ||
--log-http-on-failure | ||
---- | ||
|
||
* Testing CPU usage for user logins | ||
+ | ||
[source,bash,subs="+quotes"] | ||
---- | ||
./benchmark.sh eu-west-1 \ | ||
--scenario=keycloak.scenario.authentication.AuthorizationCode \ | ||
--server-url=${KEYCLOAK_URL} \ | ||
--realm-name=realm-0 \ | ||
--users-per-sec=**<number of users per second>** \ | ||
--ramp-up=20 \ | ||
--logout-percentage=100 \ | ||
--measurement=600 \ | ||
--users-per-realm=100000 \ | ||
--log-http-on-failure | ||
---- | ||
|
||
* Testing CPU usage for client credential grants | ||
+ | ||
[source,bash,subs="+quotes"] | ||
---- | ||
./benchmark.sh eu-west-1 \ | ||
--scenario=keycloak.scenario.authentication.AuthorizationCode \ | ||
--server-url=${KEYCLOAK_URL} \ | ||
--realm-name=realm-0 \ | ||
--users-per-sec=**<number of clients per second>** \ | ||
--ramp-up=20 \ | ||
--logout-percentage=100 \ | ||
--measurement=600 \ | ||
--users-per-realm=100000 \ | ||
--log-http-on-failure | ||
---- |