Skip to content

Commit dfb51f0

Browse files
[Feat] document v4 cluster size (#198)
* [Feat] document v4 cluster size * Apply fixes from review
1 parent ef94f5b commit dfb51f0

File tree

1 file changed

+97
-0
lines changed

1 file changed

+97
-0
lines changed
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
---
2+
title: Recommended Kubernetes cluster sizing
3+
description: The recommended number of nodes and compute per pod in your Rhize Kubernetes cluster
4+
---
5+
6+
Rhize runs on Kubernetes.
7+
8+
This document provides compute recommendations for the nodes, pods services of your [Rhize Install]({{< relref "install" >}}).
9+
Some services also have recommended replication factors to increase reliability.
10+
11+
## Node recommendations
12+
13+
The following tables are the minimum recommended sizes to provision your cluster for Rhize {{% param v %}}.
14+
15+
### Rhize nodes
16+
17+
For high availability, Rhize recommends a **minimum of three nodes** with the following specifications.
18+
19+
| Property | Value |
20+
|-----------------------|-------------------|
21+
| Number of nodes | 3 |
22+
| CPU Speed (GHz) | 3.3 |
23+
| vCPU per Node | 16 |
24+
| Memory per node (GiB) | 32 (64 is better) |
25+
| Persisted volumes | 16 |
26+
| Persisted Volume IOPS | 5000 |
27+
| PV Throughput (MBps) | 500 |
28+
| Total Disk Space (TB) | 3 |
29+
| Disk IOPS | 5000 |
30+
| Disk MBps | 500 |
31+
32+
### Rhize agent
33+
34+
The Rhize agent typically runs on the edge, outside of the cluster entirely.
35+
For the Rhize Agent, the minimum recommended specifications are as follows:
36+
37+
| Property | Value |
38+
|-----------------------|-------|
39+
| CPU Speed (GHz) | 2.8 |
40+
| vCPU per Node | 2 |
41+
| Memory per node (GiB) | 1 |
42+
| Persisted volumes | 1 |
43+
44+
## Service-level recommendations
45+
46+
The following table lists the **minimum** recommended specifications for the main services.
47+
Services with stateful PV have a persistent volume per pod.
48+
49+
>![Warn]
50+
> Avoid NFS or SMB filesystems. These are known to lead to file corruption in BaaS and do not work at all with various other services.
51+
52+
53+
| Service | Pods for HA (replica count) | vCPU per Pod | Memory Per Pod | Stateful PV | DiskSize (GiB) | Comments |
54+
|------------------------|-----------------------------|--------------|----------------|-------------|----------------|----------------------------------------------------------------------|
55+
| `baas-alpha` | 3 | 8 | 16 (at least) | Yes | 750 | High throughput and IOPS |
56+
| `baas-zero` | 3 | 2 | 2 | Yes | 300 | High throughput and IOPS |
57+
| `workflow` | 3 | 1 | 2 | No | N/A | HA requires 2 pods, but 3 is to avoid hotkey issues and balance load |
58+
| `isa95` | 2 | 2 | 1 | NO | N/A | |
59+
| `keycloak-postgres` | 2 | 1 | 2 | No | 200 | Runs in pod with `keycloak` |
60+
| `keycloak` | 2 | 1 | 2 | No | N/A | |
61+
| `libre-audit-postgres` | 2 | 1 | 2 | Yes | 250 | Runs in pod with `libre-audit` |
62+
| `libre-ui` | 3 | 0.25 | 0.25 | No | N/A | |
63+
| `quest-db` | 1 | 4 | 8 | Yes | 250 | High Throughput and IPOS |
64+
| `redpanda` | 3 | | | Yes | 100 | High IOPS |
65+
| `restate` | 3 | | | Yes | 50 | High Throughput and IPOS |
66+
| `appsmith` | 3 | 4 | | Yes | 50 | High Throughput and IPOS |
67+
68+
69+
### Monitoring stack
70+
71+
The following table provides minimal compute recommendations for the monitoring stack.
72+
73+
The default recommendation is to run your Rhize observability stack in the nodes that also run the Rhize application.
74+
However, some deployments prefer to separate monitoring to its own cluster.
75+
76+
| Service | Pods for HA (replica count) | vCPU cores per pod | Memory per pod | DiskSize (GiB) |
77+
|-------------------------|-----------------------------|--------------------|----------------|----------------|
78+
| `grafana` | 3 | 0.5 | 2 | 50GB |
79+
| `prometheus-node` | 4 | 0.25 | 0.05 | N/A |
80+
| `prometheus-server` | 1 per pod | 1 | 2 | 1 |
81+
| `promtail` | 4 | 0.25 | 0.2 | N/A |
82+
| `loki` | 1 | 1 | 1 | 1 |
83+
| `loki-logs` | 1 per pod | 0.25 | 0.1 | N/A |
84+
| `loki-canary` | 4 | 0.25 | 0.1 | N/A |
85+
| `loki-gateway` | 1 | 0.25 | 0.05 | 0.25 |
86+
| `loki-grafana-operator` | 1 | 0.25 | 0.1 | 0.25 |
87+
| `tempo-compactor` | 1 | 0.25 | 2 | 0.25 |
88+
| `tempo-ingester` | 3 | 0.5 | 0.75 | 1.5 |
89+
| `tempo-querier` | 1 | 0.25 | 0.5 | 0.25 |
90+
| `tempo-distributor` | 1 | 0.25 | 0.5 | 0.25 |
91+
| `tempo-query-frontend` | 1 | 0.25 | 0.5 | 0.25 |
92+
| `temp-memcache` | 1 | 0.25 | 0.1 | 0.25 |
93+
94+
## Back up
95+
96+
You can [back up Rhize to S3](/deploy/backup/binary/) .
97+
Consider including an S3 bucket as part of your deployment.

0 commit comments

Comments
 (0)