From 64cf50078c282e4505eb3d4351e47c4306c069ce Mon Sep 17 00:00:00 2001 From: Alexander Schwartz Date: Tue, 18 Jul 2023 18:42:07 +0200 Subject: [PATCH] Add ROSA Benchmark Key Results Closes #432 Co-authored-by: Michal Hajas --- doc/benchmark/modules/ROOT/nav.adoc | 1 + .../report/rosa-benchmark-key-results.adoc | 209 ++++++++++++++++++ 2 files changed, 210 insertions(+) create mode 100644 doc/benchmark/modules/ROOT/pages/report/rosa-benchmark-key-results.adoc diff --git a/doc/benchmark/modules/ROOT/nav.adoc b/doc/benchmark/modules/ROOT/nav.adoc index 4a306f7c4..2192a8909 100644 --- a/doc/benchmark/modules/ROOT/nav.adoc +++ b/doc/benchmark/modules/ROOT/nav.adoc @@ -13,6 +13,7 @@ ** xref:report/trend-report.adoc[] ** xref:report/diagram-types.adoc[] ** xref:report/result-summary.adoc[] +** xref:report/rosa-benchmark-key-results.adoc[] * xref:scenario-overview.adoc[] ** xref:scenario/authorization-code.adoc[] ** xref:scenario/list-sessions.adoc[] diff --git a/doc/benchmark/modules/ROOT/pages/report/rosa-benchmark-key-results.adoc b/doc/benchmark/modules/ROOT/pages/report/rosa-benchmark-key-results.adoc new file mode 100644 index 000000000..873b0c5ec --- /dev/null +++ b/doc/benchmark/modules/ROOT/pages/report/rosa-benchmark-key-results.adoc @@ -0,0 +1,209 @@ += Keycloak on ROSA Benchmark Key Results + +This summarizes a benchmark run with Keycloak 22 performed in July 2023. +Use this as a starting point to calculate the requirements of a Keycloak environment. +Use them to perform a load testing in your environment. + +[WARNING] +==== +CPU usage for refreshing a token is currently missing. +We hope to add this soon. +==== + +== Data collection + +These are rough estimates from looking at Grafana dashboards. +A full automation is pending to show repeatable results over different releases. + +== Setup + +* OpenShift 4.13.x deployed on AWS via ROSA. +* Machinepool with `m5.4xlarge` instances. +* Keycloak 22 deployed with Operator and 3 pods. +* Default user password hashing with PBKDF2 27,500 hash iterations. +* Database seeded with 100,000 users and 100,000 clients. +* Infinispan caches at default of 10,000 entries, so not all clients and users fit into the cache, and some requests will need to fetch the data from the database. +* All sessions in distributed caches as per default, with two owners per entries, allowing one failing pod without losing data. +* PostgreSQL deployed inside the same OpenShift with ephemeral storage. ++ +Using a database with persistent storage will have longer database latencies, which might lead to longer response times; still, the throughput should be similar. + +== Installation + +Deploy OpenShift and ROSA as described in xref:kubernetes-guide::prerequisite/prerequisite-rosa.adoc[ROSA] and xref:kubernetes-guide::prerequisite/prerequisite-openshift.adoc[OpenShift] with + +.OpenShift `.env` file +---- +# no KC_CPU_LIMITS set for this scenario +KC_CPU_REQUESTS=6 +KC_INSTANCES=3 +KC_DISABLE_STICKY_SESSION=true +KC_MEMORY_REQUESTS_MB=4000 +KC_MEMORY_LIMITS_MB=4000 +KC_HEAP_MAX_MB=2048 +KC_DB_POOL_INITIAL_SIZE=30 +KC_DB_POOL_MAX_SIZE=30 +KC_DB_POOL_MIN_SIZE=30 +---- + +== Performance results + +[WARNING] +==== +* Performance will be lowered when scaling to more Pods (due to additional overhead) and using a cross-datacenter setup (due to additional traffic and operations). + +* Increased cache sizes can improve the performance when Keycloak instances run for a longer time. Still, those caches need to be filled when an instance is restarted. + +* Use these values as a starting point and perform your own load tests before going into production. +==== + +Summary: + +* The used CPU scales linearly with the number of requests up to the tested limit below. +* The used memory scales linearly with the number of active sessions up to the tested limit below. + +Observations: + +* The base memory usage for an inactive Pod is 1 GB of RAM. + +* Leave 1 GB extra head-room for spikes of RAM. + +* For each 100,000 active user sessions, add 500 MB per Pod in a three-node cluster (tested with up to 200,000 sessions). ++ +This assumes that each user connects to only one client. +Memory requirements increase with the number of client sessions per user session (not tested yet). + +* For each 45 user logins per second, 1 vCPU per Pod in a three-node cluster (tested with up to 300 per second). ++ +Keycloak spends most of the CPU time hashing the password provided by the user. + +* For each 250 client credential grants per second, 1 vCPU per Pod in a three node cluster (tested with up to 2000 per second). ++ +Most CPU time goes into creating new TLS connections, as each client runs only a single request. + +* Leave 100% extra head-room for CPU usage to handle spikes in the load. +Performance of Keycloak dropped significantly when its Pods were throttled in our tests. + +=== Calculation example + +Target size: + +* 50,000 active user sessions +* 45 logins per seconds +* 250 client credential grants per second + +Limits calculated: + +* CPU requested: 2 vCPU ++ +(45 logins per second = 1 vCPU, 250 client credential grants per second = 1 vCPU) + +* CPU limit: 4 vCPU ++ +(doubling the CPU requested to handle peaks, and also refresh token handling which we don't have numbers on, yet) + +* Memory requested: 1.2 GB ++ +(1 GB base memory plus 200 MB RAM for 50,000 active sessions) + +* Memory limit: 2.2 GB ++ +(adding 1 GB to the memory requested) + +== Tests performed + +Each test ran for 10 minutes. + +. Setup ROSA cluster as default. +. Scale machine pool. ++ +[source,bash,subs="+quotes"] +---- +rosa edit machinepool -c **** --min-replicas 3 --max-replicas 10 scaling +---- +. Deploy Keycloak and Monitoring ++ +[source,bash] +---- +cd provision/openshift +task +task monitoring +---- +. Create dataset ++ +[source,bash] +---- +task dataset-import -- -a create-realms -u 100000 +# wait for first task to complete +task dataset-import -- -a create-clients -c 100000 -n realm-0 +---- +. Prepare environment for running the benchmark via Ansible ++ +See xref:run/running-benchmark-ansible.adoc[] for details. ++ +.Contents of `env.yml` used here +[source,yaml] +---- +cluster_size: 5 +instance_type: t3.small +instance_volume_size: 30 +kcb_zip: ../benchmark/target/keycloak-benchmark-0.10-SNAPSHOT.zip +kcb_heap_size: 1G +---- + +. Create load runners ++ +[source,bash,subs="+quotes"] +---- +cd ../../ansible +./aws_ec2.sh start **** +---- +. Run different load tests + +* Testing memory for creating sessions ++ +[source,bash,subs="+quotes"] +---- +./benchmark.sh eu-west-1 \ +--scenario=keycloak.scenario.authentication.AuthorizationCode \ +--server-url=${KEYCLOAK_URL} \ +--realm-name=realm-0 \ +--users-per-sec=**** \ +--ramp-up=20 \ +--logout-percentage=0 \ +--measurement=600 \ +--users-per-realm=100000 \ +--log-http-on-failure +---- + +* Testing CPU usage for user logins ++ +[source,bash,subs="+quotes"] +---- +./benchmark.sh eu-west-1 \ +--scenario=keycloak.scenario.authentication.AuthorizationCode \ +--server-url=${KEYCLOAK_URL} \ +--realm-name=realm-0 \ +--users-per-sec=**** \ +--ramp-up=20 \ +--logout-percentage=100 \ +--measurement=600 \ +--users-per-realm=100000 \ +--log-http-on-failure +---- + +* Testing CPU usage for client credential grants ++ +[source,bash,subs="+quotes"] +---- +./benchmark.sh eu-west-1 \ +--scenario=keycloak.scenario.authentication.AuthorizationCode \ +--server-url=${KEYCLOAK_URL} \ +--realm-name=realm-0 \ +--users-per-sec=**** \ +--ramp-up=20 \ +--logout-percentage=100 \ +--measurement=600 \ +--users-per-realm=100000 \ +--log-http-on-failure +----