-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Michael's feedback Add Martin's NIC queues note Adding Tuned CR to configure NIC queues note for AMD per-pod CPUs final review comments for Core RDS 418 Telco core RDS 4.18 docs Michael's feedback Add Martin's NIC queues note Adding Tuned CR to configure NIC queues note for AMD per-pod CPUs final review comments for Core RDS 418 Last minute comments
- Loading branch information
Showing
51 changed files
with
1,117 additions
and
491 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 23 additions & 0 deletions
23
modules/telco-core-about-the-telco-core-cluster-use-model.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
// Module included in the following assemblies: | ||
// | ||
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc | ||
|
||
:_mod-docs-content-type: REFERENCE | ||
[id="telco-core-about-the-telco-core-cluster-use-model_{context}"] | ||
= About the telco core cluster use model | ||
|
||
The telco core cluster use model is designed for clusters that run on commodity hardware. | ||
Telco core clusters support large scale telco applications including control plane functions such as signaling, aggregation, and session border controller (SBC); and centralized data plane functions such as 5G user plane functions (UPF). | ||
Telco core cluster functions require scalability, complex networking support, resilient software-defined storage, and support performance requirements that are less stringent and constrained than far-edge RAN deployments. | ||
|
||
.Telco core RDS cluster service-based architecture and networking topology | ||
image::openshift-5g-core-cluster-architecture-networking.png[5G core cluster showing a service-based architecture with overlaid networking topology] | ||
|
||
Networking requirements for telco core functions vary widely across a range of networking features and performance points. | ||
IPv6 is a requirement and dual-stack is common. | ||
Some functions need maximum throughput and transaction rate and require support for user-plane DPDK networking. | ||
Other functions use more typical cloud-native patterns and can rely on OVN-Kubernetes, kernel networking, and load balancing. | ||
|
||
Telco core clusters are configured as standard with three control plane and two or more worker nodes configured with the stock (non-RT) kernel. | ||
In support of workloads with varying networking and performance requirements, you can segment worker nodes by using `MachineConfigPool` custom resources (CR), for example, for non-user data plane or high-throughput use cases. | ||
In support of required telco operational features, core clusters have a standard set of Day 2 OLM-managed Operators installed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
// Module included in the following assemblies: | ||
// | ||
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc | ||
|
||
:_mod-docs-content-type: REFERENCE | ||
[id="telco-core-additional-storage-solutions_{context}"] | ||
= Additional storage solutions | ||
You can use other storage solutions to provide persistent storage for telco core clusters. | ||
The configuration and integration of these solutions is outside the scope of the reference design specification (RDS). | ||
|
||
Integration of the storage solution into the telco core cluster must include proper sizing and performance analysis to ensure the storage meets overall performance and resource usage requirements. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
// Module included in the following assemblies: | ||
// | ||
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc | ||
|
||
:_mod-docs-content-type: REFERENCE | ||
[id="telco-core-agent-based-installer_{context}"] | ||
= Agent-based Installer | ||
|
||
New in this release:: | ||
* No reference design updates in this release | ||
|
||
Description:: | ||
+ | ||
-- | ||
Telco core clusters can be installed by using the Agent-based Installer. | ||
This method allows you to install OpenShift on bare-metal servers without requiring additional servers or VMs for managing the installation. | ||
The Agent-based Installer can be run on any system (for example, from a laptop) to generate an ISO installation image. | ||
The ISO is used as the installation media for the cluster supervisor nodes. | ||
Installation progress can be monitored using the ABI tool from any system with network connectivity to the supervisor node's API interfaces. | ||
|
||
ABI supports the following: | ||
|
||
* Installation from declarative CRs | ||
* Installation in disconnected environments | ||
* Installation with no additional supporting install or bastion servers required to complete the installation | ||
-- | ||
Limits and requirements:: | ||
* Disconnected installation requires a registry that is reachable from the installed host, with all required content mirrored in that registry. | ||
|
||
Engineering considerations:: | ||
* Networking configuration should be applied as NMState configuration during installation. | ||
Day 2 networking configuration using the NMState Operator is not supported. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
// Module included in the following assemblies: | ||
// | ||
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc | ||
|
||
:_mod-docs-content-type: REFERENCE | ||
[id="telco-core-application-workloads_{context}"] | ||
= Application workloads | ||
|
||
Application workloads running on telco core clusters can include a mix of high performance cloud-native network functions (CNFs) and traditional best-effort or burstable pod workloads. | ||
|
||
Guaranteed QoS scheduling is available to pods that require exclusive or dedicated use of CPUs due to performance or security requirements. | ||
Typically, pods that run high performance or latency sensitive CNFs by using user plane networking (for example, DPDK) require exclusive use of dedicated whole CPUs achieved through node tuning and guaranteed QoS scheduling. | ||
When creating pod configurations that require exclusive CPUs, be aware of the potential implications of hyper-threaded systems. | ||
Pods should request multiples of 2 CPUs when the entire core (2 hyper-threads) must be allocated to the pod. | ||
|
||
Pods running network functions that do not require high throughput or low latency networking should be scheduled with best-effort or burstable QoS pods and do not require dedicated or isolated CPU cores. | ||
|
||
Engineering considerations:: | ||
+ | ||
-- | ||
Use the following information to plan telco core workloads and cluster resources: | ||
|
||
* CNF applications should conform to the latest version of https://redhat-best-practices-for-k8s.github.io/guide/[Red Hat Best Practices for Kubernetes]. | ||
* Use a mix of best-effort and burstable QoS pods as required by your applications. | ||
** Use guaranteed QoS pods with proper configuration of reserved or isolated CPUs in the `PerformanceProfile` CR that configures the node. | ||
** Guaranteed QoS Pods must include annotations for fully isolating CPUs. | ||
** Best effort and burstable pods are not guaranteed exclusive CPU use. | ||
Workloads can be preempted by other workloads, operating system daemons, or kernel tasks. | ||
* Use exec probes sparingly and only when no other suitable option is available. | ||
** Do not use exec probes if a CNF uses CPU pinning. | ||
Use other probe implementations, for example, `httpGet` or `tcpSocket`. | ||
** When you need to use exec probes, limit the exec probe frequency and quantity. | ||
The maximum number of exec probes must be kept below 10, and the frequency must not be set to less than 10 seconds. | ||
** You can use startup probes, because they do not use significant resources at steady-state operation. | ||
This limitation on exec probes applies primarily to liveness and readiness probes. | ||
Exec probes cause much higher CPU usage on management cores compared to other probe types because they require process forking. | ||
-- |
48 changes: 48 additions & 0 deletions
48
modules/telco-core-cluster-common-use-model-engineering-considerations.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
// Module included in the following assemblies: | ||
// | ||
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc | ||
|
||
:_mod-docs-content-type: REFERENCE | ||
[id="telco-core-cluster-common-use-model-engineering-considerations_{context}"] | ||
= Telco core cluster common use model engineering considerations | ||
|
||
* Cluster workloads are detailed in "Application workloads". | ||
* Worker nodes should run on either of the following CPUs: | ||
** Intel 3rd Generation Xeon (IceLake) CPUs or better when supported by {product-title}, or CPUs with the silicon security bug (Spectre and similar) mitigations turned off. | ||
Skylake and older CPUs can experience 40% transaction performance drops when Spectre and similar mitigations are enabled. | ||
** AMD EPYC Zen 4 CPUs (Genoa, Bergamo, or newer) or better when supported by {product-title}. | ||
+ | ||
[NOTE] | ||
==== | ||
Currently, per-pod power management is not available for AMD CPUs. | ||
==== | ||
** IRQ balancing is enabled on worker nodes. | ||
The `PerformanceProfile` CR sets `globallyDisableIrqLoadBalancing` to false. | ||
Guaranteed QoS pods are annotated to ensure isolation as described in "CPU partitioning and performance tuning". | ||
* All cluster nodes should have the following features: | ||
** Have Hyper-Threading enabled | ||
** Have x86_64 CPU architecture | ||
** Have the stock (non-realtime) kernel enabled | ||
** Are not configured for workload partitioning | ||
* The balance between power management and maximum performance varies between machine config pools in the cluster. | ||
The following configurations should be consistent for all nodes in a machine config pools group. | ||
** Cluster scaling. | ||
See "Scalability" for more information. | ||
** Clusters should be able to scale to at least 120 nodes. | ||
* CPU partitioning is configured using a `PerformanceProfile` CR and is applied to nodes on a per `MachineConfigPool` basis. | ||
See "CPU partitioning and performance tuning" for additional considerations. | ||
* CPU requirements for {product-title} depend on the configured feature set and application workload characteristics. | ||
For a cluster configured according to the reference configuration running a simulated workload of 3000 pods as created by the kube-burner node-density test, the following CPU requirements are validated: | ||
** The minimum number of reserved CPUs for control plane and worker nodes is 2 CPUs (4 hyper-threads) per NUMA node. | ||
** The NICs used for non-DPDK network traffic should be configured to use at least 16 RX/TX queues. | ||
** Nodes with large numbers of pods or other resources might require additional reserved CPUs. | ||
The remaining CPUs are available for user workloads. | ||
+ | ||
[NOTE] | ||
==== | ||
Variations in {product-title} configuration, workload size, and workload characteristics require additional analysis to determine the effect on the number of required CPUs for the OpenShift platform. | ||
==== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
// Module included in the following assemblies: | ||
// | ||
// * scalability_and_performance/telco_core_ref_design_specs/telco-core-rds.adoc | ||
|
||
:_mod-docs-content-type: REFERENCE | ||
[id="telco-core-common-baseline-model_{context}"] | ||
= Telco core common baseline model | ||
|
||
The following configurations and use models are applicable to all telco core use cases. | ||
The telco core use cases build on this common baseline of features. | ||
|
||
Cluster topology:: | ||
Telco core clusters conform to the following requirements: | ||
|
||
* High availability control plane (three or more control plane nodes) | ||
* Non-schedulable control plane nodes | ||
* Multiple machine config pools | ||
Storage:: | ||
Telco core use cases require persistent storage as provided by {rh-storage-first}. | ||
|
||
Networking:: | ||
Telco core cluster networking conforms to the following requirements: | ||
|
||
* Dual stack IPv4/IPv6 (IPv4 primary). | ||
* Fully disconnected – clusters do not have access to public networking at any point in their lifecycle. | ||
* Supports multiple networks. | ||
Segmented networking provides isolation between operations, administration and maintenance (OAM), signaling, and storage traffic. | ||
* Cluster network type is OVN-Kubernetes as required for IPv6 support. | ||
* Telco core clusters have multiple layers of networking supported by underlying RHCOS, SR-IOV Network Operator, Load Balancer and other components. | ||
These layers include the following: | ||
** Cluster networking layer. | ||
The cluster network configuration is defined and applied through the installation configuration. | ||
Update the configuration during Day 2 operations with the NMState Operator. | ||
Use the initial configuration to establish the following: | ||
*** Host interface configuration. | ||
*** Active/active bonding (LACP). | ||
** Secondary/additional network layer. | ||
Configure the {product-title} CNI through network `additionalNetwork` or `NetworkAttachmentDefinition` CRs. | ||
Use the initial configuration to configure MACVLAN virtual network interfaces. | ||
** Application workload layer. | ||
User plane networking runs in cloud-native network functions (CNFs). | ||
Service Mesh:: | ||
Telco CNFs can use Service Mesh. | ||
All telco core clusters require a Service Mesh implementation. | ||
The choice of implementation and configuration is outside the scope of this specification. |
Oops, something went wrong.