Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions modules/egressip_configure_failover_task.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
// Module included in the following assembly:
//
// *networking/ovn_kubernetes_network_provider/egressip_failover_assembly.adoc

:_mod-docs-content-type: PROCEDURE
[id="egressip_configure_failover_task_{context}"]
= Configuring the EgressIP failover time limit

Follow this procedure to configure the `reachabilityTotalTimeoutSeconds` parameter and control how quickly the system detects a failing `egressIP` node and initiates a failover.

.Prerequisites

* Install the OpenShift CLI (`oc`).
* Log in to the cluster as a cluster administrator.

.Procedure

. Edit the `Network` custom resource by running the following command:
+
[source,bash]
----
$ oc edit network.operator cluster
----

. Navigate to the `egressIPConfig: {}` section under `spec:defaultNetwork:ovnKubernetesConfig:`

. Modify the block to include the `reachabilityTotalTimeoutSeconds` parameter with your chosen value, 5 seconds for example. Make sure to use the correct indentation:
+
[source,yaml]
----
defaultNetwork:
ovnKubernetesConfig:
egressIPConfig:
reachabilityTotalTimeoutSeconds: 5
----
+
[NOTE]
====
The value must be an integer between 0 and 60. For details on possible values, see the "EgressIP failover settings" section.
====

. Save and exit the editor. The operator automatically applies the changes.

.Verification

. Verify that the system correctly accepted the `reachabilityTotalTimeoutSeconds` parameter by running the following command:
+
[source,terminal]
----
$ oc get network.operator cluster -o yaml
----

. Inspect the output and confirm that the `reachabilityTotalTimeoutSeconds` parameter is correctly nested under `spec:defaultNetwork:ovnKubernetesConfig:egressIPConfig:` with your intended value:
+
[source,yaml]
----
# ...
spec:
# ...
defaultNetwork:
ovnKubernetesConfig:
egressIPConfig:
reachabilityTotalTimeoutSeconds: 5
gatewayConfig:
# ...
----
19 changes: 19 additions & 0 deletions modules/egressip_failover_concept.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// Module included in the following assembly:
//
// *networking/ovn_kubernetes_network_provider/egressip_failover_assembly.adoc

:_mod-docs-content-type: CONCEPT
[id="egressip_failover_concept_{context}"]
= Understanding EgressIP failover control

The `reachabilityTotalTimeoutSeconds` parameter controls how quickly the system detects a failing `egressIP` node and initiates a failover. This parameter directly determines the maximum time the platform waits before declaring a node unreachable.

[IMPORTANT]
====
When you configure `egressIP` with multiple egress nodes, the complete failover time from node failure to recovery on a new node is expected to be on the order of seconds or longer. This is because the new IP assignment can only begin after the `reachabilityTotalTimeoutSeconds` period has fully elapsed without a successful check.
====

To ensure traffic uses the correct external path, `egressIP` traffic on a node will always egress through the network interface on which the `egressIP` address has been assigned.

// Next step: The user must perform a task to implement this configuration.
// See xref:egressip_configure_failover_task.adoc[Configuring the Failover Time Limit].
18 changes: 18 additions & 0 deletions modules/egressip_failover_reference.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
// Module included in the following assembly:
//
// *networking/ovn_kubernetes_network_provider/egressip_failover_assembly.adoc

:_mod-docs-content-type: REFERENCE
[id="egressip_failover_reference_{context}"]
= EgressIP failover settings

The `reachabilityTotalTimeoutSeconds` parameter defines the total time limit in seconds for the platform health check process before a node is declared down.

The following table summarizes the acceptable values and their implications:

[cols="1,1,2a", options="header"]
|===
|Parameter Value (Seconds) |Effect on reachability check |Failover impact and use case
|`0` |Disables the reachability check. |No automatic failover: Use only if an external system handles node health monitoring and failover. The platform will not automatically react to node failures.
|`1 - 60` |Sets the total time limit for reachability probing. |Directly controls detection time: This value defines the lower limit for your overall failover time. A smaller value leads to faster failover but might increase network traffic. Default: 1 second. The maximum accepted integer value is 60.
|===
42 changes: 0 additions & 42 deletions modules/nw-egress-ips-config-object.adoc

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,18 @@ include::modules/nw-egress-ips-considerations.adoc[leveloffset=+1]
// Assigning an egress IP address to a namespace
include::modules/nw-egress-ips-assign.adoc[leveloffset=+1]

// START: Conditional block for EgressIP Failover Configuration (Replaces nw-egress-ips-config-object.adoc)
ifndef::openshift-rosa,openshift-rosa-hcp[]
// Explains the 'Why' and 'What' of failover control (The Job to be Done)
include::modules/egressip_failover_concept.adoc[leveloffset=+1]

// Provides the step-by-step instructions (The How-To)
include::modules/egressip_configure_failover_task.adoc[leveloffset=+2]

// REFERENCE: Describes the parameters (The table of values)
include::modules/egressip_failover_reference.adoc[leveloffset=+2]
endif::openshift-rosa,openshift-rosa-hcp[]

// Labeling a node to host egress IP addresses
include::modules/nw-egress-ips-node.adoc[leveloffset=+1]

Expand All @@ -34,11 +46,6 @@ ifndef::openshift-rosa[]
include::modules/nw-egress-ips-object-dual-stack.adoc[leveloffset=+1]
endif::openshift-rosa[]

ifndef::openshift-rosa,openshift-rosa-hcp[]
// The egressIPConfig object
include::modules/nw-egress-ips-config-object.adoc[leveloffset=+1]
endif::openshift-rosa,openshift-rosa-hcp[]

[role="_additional-resources"]
[id="configuring-egress-ips-additional-resources"]
== Additional resources
Expand Down