diff --git a/modules/egressip_configure_failover_task.adoc b/modules/egressip_configure_failover_task.adoc new file mode 100644 index 000000000000..ff693b2475f8 --- /dev/null +++ b/modules/egressip_configure_failover_task.adoc @@ -0,0 +1,66 @@ +// Module included in the following assembly: +// +// *networking/ovn_kubernetes_network_provider/egressip_failover_assembly.adoc + +:_mod-docs-content-type: PROCEDURE +[id="egressip_configure_failover_task_{context}"] += Configuring the EgressIP failover time limit + +Follow this procedure to configure the `reachabilityTotalTimeoutSeconds` parameter and control how quickly the system detects a failing `egressIP` node and initiates a failover. + +.Prerequisites + +* Install the OpenShift CLI (`oc`). +* Log in to the cluster as a cluster administrator. + +.Procedure + +. Edit the `Network` custom resource by running the following command: ++ +[source,bash] +---- +$ oc edit network.operator cluster +---- + +. Navigate to the `egressIPConfig: {}` section under `spec:defaultNetwork:ovnKubernetesConfig:` + +. Modify the block to include the `reachabilityTotalTimeoutSeconds` parameter with your chosen value, 5 seconds for example. Make sure to use the correct indentation: ++ +[source,yaml] +---- + defaultNetwork: + ovnKubernetesConfig: + egressIPConfig: + reachabilityTotalTimeoutSeconds: 5 +---- ++ +[NOTE] +==== +The value must be an integer between 0 and 60. For details on possible values, see the "EgressIP failover settings" section. +==== + +. Save and exit the editor. The operator automatically applies the changes. + +.Verification + +. Verify that the system correctly accepted the `reachabilityTotalTimeoutSeconds` parameter by running the following command: ++ +[source,terminal] +---- +$ oc get network.operator cluster -o yaml +---- + +. Inspect the output and confirm that the `reachabilityTotalTimeoutSeconds` parameter is correctly nested under `spec:defaultNetwork:ovnKubernetesConfig:egressIPConfig:` with your intended value: ++ +[source,yaml] +---- + # ... + spec: + # ... + defaultNetwork: + ovnKubernetesConfig: + egressIPConfig: + reachabilityTotalTimeoutSeconds: 5 + gatewayConfig: + # ... +---- \ No newline at end of file diff --git a/modules/egressip_failover_concept.adoc b/modules/egressip_failover_concept.adoc new file mode 100644 index 000000000000..13ed69126fac --- /dev/null +++ b/modules/egressip_failover_concept.adoc @@ -0,0 +1,19 @@ +// Module included in the following assembly: +// +// *networking/ovn_kubernetes_network_provider/egressip_failover_assembly.adoc + +:_mod-docs-content-type: CONCEPT +[id="egressip_failover_concept_{context}"] += Understanding EgressIP failover control + +The `reachabilityTotalTimeoutSeconds` parameter controls how quickly the system detects a failing `egressIP` node and initiates a failover. This parameter directly determines the maximum time the platform waits before declaring a node unreachable. + +[IMPORTANT] +==== +When you configure `egressIP` with multiple egress nodes, the complete failover time from node failure to recovery on a new node is expected to be on the order of seconds or longer. This is because the new IP assignment can only begin after the `reachabilityTotalTimeoutSeconds` period has fully elapsed without a successful check. +==== + +To ensure traffic uses the correct external path, `egressIP` traffic on a node will always egress through the network interface on which the `egressIP` address has been assigned. + +// Next step: The user must perform a task to implement this configuration. +// See xref:egressip_configure_failover_task.adoc[Configuring the Failover Time Limit]. \ No newline at end of file diff --git a/modules/egressip_failover_reference.adoc b/modules/egressip_failover_reference.adoc new file mode 100644 index 000000000000..169b16025872 --- /dev/null +++ b/modules/egressip_failover_reference.adoc @@ -0,0 +1,18 @@ +// Module included in the following assembly: +// +// *networking/ovn_kubernetes_network_provider/egressip_failover_assembly.adoc + +:_mod-docs-content-type: REFERENCE +[id="egressip_failover_reference_{context}"] += EgressIP failover settings + +The `reachabilityTotalTimeoutSeconds` parameter defines the total time limit in seconds for the platform health check process before a node is declared down. + +The following table summarizes the acceptable values and their implications: + +[cols="1,1,2a", options="header"] +|=== +|Parameter Value (Seconds) |Effect on reachability check |Failover impact and use case +|`0` |Disables the reachability check. |No automatic failover: Use only if an external system handles node health monitoring and failover. The platform will not automatically react to node failures. +|`1 - 60` |Sets the total time limit for reachability probing. |Directly controls detection time: This value defines the lower limit for your overall failover time. A smaller value leads to faster failover but might increase network traffic. Default: 1 second. The maximum accepted integer value is 60. +|=== \ No newline at end of file diff --git a/modules/nw-egress-ips-config-object.adoc b/modules/nw-egress-ips-config-object.adoc deleted file mode 100644 index 79772333be7c..000000000000 --- a/modules/nw-egress-ips-config-object.adoc +++ /dev/null @@ -1,42 +0,0 @@ -// Module included in the following assemblies: -// -// * networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.adoc - -:_mod-docs-content-type: CONCEPT -[id="nw-egress-ips-config-object_{context}"] -= The egressIPConfig object - -As a feature of egress IP, the `reachabilityTotalTimeoutSeconds` parameter configures the EgressIP node reachability check total timeout in seconds. If the EgressIP node cannot be reached within this timeout, the node is declared down. - -You can set a value for the `reachabilityTotalTimeoutSeconds` in the configuration file for the `egressIPConfig` object. Setting a large value might cause the EgressIP implementation to react slowly to node changes. The implementation reacts slowly for EgressIP nodes that have an issue and are unreachable. - -If you omit the `reachabilityTotalTimeoutSeconds` parameter from the `egressIPConfig` object, the platform chooses a reasonable default value, which is subject to change over time. The current default is `1` second. A value of `0` disables the reachability check for the EgressIP node. - -The following `egressIPConfig` object describes changing the `reachabilityTotalTimeoutSeconds` from the default `1` second probes to `5` second probes: - -[source,yaml] ----- -apiVersion: operator.openshift.io/v1 -kind: Network -metadata: - name: cluster -spec: - clusterNetwork: - - cidr: 10.128.0.0/14 - hostPrefix: 23 - defaultNetwork: - ovnKubernetesConfig: - egressIPConfig: - reachabilityTotalTimeoutSeconds: 5 - gatewayConfig: - routingViaHost: false - genevePort: 6081 ----- - --- -where: - -``:: The `egressIPConfig` holds the configurations for the options of the `EgressIP` object. By changing these configurations, you can extend the `EgressIP` object. - -``:: The value for `reachabilityTotalTimeoutSeconds` accepts integer values from `0` to `60`. A value of `0` disables the reachability check of the egressIP node. Setting a value from `1` to `60` corresponds to the timeout in seconds for a probe to send the reachability check to the node. --- diff --git a/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.adoc b/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.adoc index 68f33e710dfe..691b33089c8a 100644 --- a/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.adoc +++ b/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.adoc @@ -26,6 +26,18 @@ include::modules/nw-egress-ips-considerations.adoc[leveloffset=+1] // Assigning an egress IP address to a namespace include::modules/nw-egress-ips-assign.adoc[leveloffset=+1] +// START: Conditional block for EgressIP Failover Configuration (Replaces nw-egress-ips-config-object.adoc) +ifndef::openshift-rosa,openshift-rosa-hcp[] +// Explains the 'Why' and 'What' of failover control (The Job to be Done) +include::modules/egressip_failover_concept.adoc[leveloffset=+1] + +// Provides the step-by-step instructions (The How-To) +include::modules/egressip_configure_failover_task.adoc[leveloffset=+2] + +// REFERENCE: Describes the parameters (The table of values) +include::modules/egressip_failover_reference.adoc[leveloffset=+2] +endif::openshift-rosa,openshift-rosa-hcp[] + // Labeling a node to host egress IP addresses include::modules/nw-egress-ips-node.adoc[leveloffset=+1] @@ -34,11 +46,6 @@ ifndef::openshift-rosa[] include::modules/nw-egress-ips-object-dual-stack.adoc[leveloffset=+1] endif::openshift-rosa[] -ifndef::openshift-rosa,openshift-rosa-hcp[] -// The egressIPConfig object -include::modules/nw-egress-ips-config-object.adoc[leveloffset=+1] -endif::openshift-rosa,openshift-rosa-hcp[] - [role="_additional-resources"] [id="configuring-egress-ips-additional-resources"] == Additional resources