Skip to content

Commit

Permalink
multus: detect network CIDRs via canary
Browse files Browse the repository at this point in the history
Change how Rook detects network CIDRs for Multus networks. The IPAM
configuration is only defined as an arbitrary string JSON blob with a
"type" field and nothing more. Rook's detection of CIDRs for whereabouts
had already grown out of date since the initial implementation.
Additionally, Rook did not support DHCP IPAM, which is a reasonable
choice for users. And more, Rook did not support CNI plugin chaining,
which further complicates NADs. Based on the CNI spec, network chaning
can result in any changes to network CIDRs from the first-given plugin.

All these problems make it more and more difficult for Rook to support
Multus by inspecting the NAD itself to predict network CIDRs. Instead,
it is better for Rook to treat the CNI process as a black box. To
preserve legacy functionality of auto-detecting networks and to make
that as robust as possible, change to a canary-style architecture like
that used for Ceph mons, from which Rook will detect the network CIDRs
if possible.

Also allow users to specify overrides for CIDR ranges. This allows Rook
to still support esoteric and unexpected NAD or network configurations
where a CIDR range is not detectable or where the range detected would
be incomplete. Because it may be impossible for Rook to understand the
network CIDRs wholistically while residing only on a portion of the
network, this feature should have been present from Multus's inception.

Improving CIDR auto-detection and allowing users to specify overrides
for auto-detected CIDRs rounds out Rook's Multus support for CephCluster
(core/RADOS) installations. No further architectural changes should be
needed for CephClusters as regards application of public/cluster network
CIDRs for Multus networks.

Signed-off-by: Blaine Gardner <blaine.gardner@ibm.com>
(cherry picked from commit 3c43268)
(cherry picked from commit 8b72dfa)
  • Loading branch information
BlaineEXE committed Sep 7, 2023
1 parent e185e93 commit 6baeca7
Show file tree
Hide file tree
Showing 36 changed files with 2,532 additions and 868 deletions.
155 changes: 91 additions & 64 deletions Documentation/CRDs/Cluster/ceph-cluster-crd.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,12 @@ If not specified, the default SDN will be used.
Configure the network that will be enabled for the cluster and services.

* `provider`: Specifies the network provider that will be used to connect the network interface. You can choose between `host`, and `multus`.
* `selectors`: List the network selector(s) that will be used associated by a key.
* `selectors`: Used for `multus` provider only. Select NetworkAttachmentDefinitions to use for Ceph networks.
* `public`: Select the NetworkAttachmentDefinition to use for the public network.
* `cluster`: Select the NetworkAttachmentDefinition to use for the cluster network.
* `addressRanges`: Used for `host` or `multus` providers only. Allows overriding the address ranges (CIDRs) that Ceph will listen on.
* `public`: A list of individual network ranges in CIDR format to use for Ceph's public network.
* `cluster`: A list of individual network ranges in CIDR format to use for Ceph's cluster network.
* `ipFamily`: Specifies the network stack Ceph daemons should listen on.
* `dualStack`: Specifies that Ceph daemon should listen on both IPv4 and IPv6 network stacks.
* `connections`: Settings for network connections using Ceph's msgr2 protocol
Expand All @@ -185,105 +190,127 @@ Configure the network that will be enabled for the cluster and services.
Changing networking configuration after a Ceph cluster has been deployed is NOT
supported and will result in a non-functioning cluster.

#### Host Networking

To use host networking, set `provider: host`.

If the host networking setting is changed in a cluster where mons are already running, the existing mons will
remain running with the same network settings with which they were created. To complete the conversion
to or from host networking after you update this setting, you will need to
[failover the mons](../../Storage-Configuration/Advanced/ceph-mon-health.md#failing-over-a-monitor)
in order to have mons on the desired network configuration.
#### Ceph public and cluster networks

#### Multus
Ceph daemons can operate on up to two distinct networks: public, and cluster.

Rook supports addition of public and cluster network for ceph using Multus
Ceph daemons always use the public network, which is the Kubernetes pod network by default. The
public network is used for client communications with the Ceph cluster (reads/writes).

The selector keys are required to be `public` and `cluster` where each represent:
If specified, the cluster network is used to isolate internal Ceph replication traffic. This includes
additional copies of data replicated between OSDs during client reads/writes. This also includes OSD
data recovery (re-replication) when OSDs or nodes go offline. If the cluster network is unspecified,
the public network is used for this traffic instead.

* `public`: client communications with the cluster (reads/writes)
* `cluster`: internal Ceph replication network
Some Rook network providers allow manually specifying the public and network interfaces that Ceph
will use for data traffic. Use `addressRanges` to specify this. For example:

If you want to learn more, please read:
```yaml
network:
provider: host
addressRanges:
public:
- "192.168.100.0/24"
- "192.168.101.0/24"
cluster:
- "192.168.200.0/24"
```

* [Ceph Networking reference](https://docs.ceph.com/docs/master/rados/configuration/network-config-ref/).
* [Multus documentation](https://intel.github.io/multus-cni/doc/how-to-use.html)
This spec translates directly to Ceph's `public_network` and `host_network` configurations.
Refer to [Ceph networking documentation](https://docs.ceph.com/docs/master/rados/configuration/network-config-ref/)
for more details.

Based on the configuration, the operator will do the following:
The default, unspecified network provider cannot make use of these configurations.

1. If only the `public` selector is specified, all communication will happen on that network
Ceph public and cluster network configurations are allowed to change, but this should be done with
great care. When updating underlying networks or Ceph network settings, Rook assumes that the
current network configuration used by Ceph daemons will continue to operate as intended. Network
changes are not applied to Ceph daemon pods (like OSDs and MDSes) until the pod is restarted. When
making network changes, ensure that restarted pods will not lose connectivity to existing pods, and
vice versa.

```yaml
network:
provider: multus
selectors:
public: rook-ceph/rook-public-nw
```
#### Host Networking

2. If only the `cluster` selector is specified, the internal cluster traffic\* will happen on that network. All other traffic to mons, OSDs, and other daemons will be on the default network.
To use host networking, set `provider: host`.

```yaml
network:
provider: multus
selectors:
cluster: rook-ceph/rook-cluster-nw
```
To instruct Ceph to operate on specific host interfaces or networks, use `addressRanges` to select
the network CIDRs Ceph will bind to on the host.

3. If both `public` and `cluster` selectors are specified the first one will run all the communication network and the second the internal cluster network\*
If the host networking setting is changed in a cluster where mons are already running, the existing mons will
remain running with the same network settings with which they were created. To complete the conversion
to or from host networking after you update this setting, you will need to
[failover the mons](../../Storage-Configuration/Advanced/ceph-mon-health.md#failing-over-a-monitor)
in order to have mons on the desired network configuration.

```yaml
network:
provider: multus
selectors:
public: rook-ceph/rook-public-nw
cluster: rook-ceph/rook-cluster-nw
```
#### Multus

\* Internal cluster traffic includes OSD heartbeats, data replication, and data recovery
Rook supports using Multus NetworkAttachmentDefinitions for Ceph public and cluster networks.

Only OSD pods will have both Public and Cluster networks attached. The rest of the Ceph component pods and CSI pods will only have the Public network attached.
Rook Ceph operator will not have any networks attached as it proxies the required commands via a sidecar container in the mgr pod.
Refer to [Multus documentation](https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/how-to-use.md)
for details about how to set up and select Multus networks.

In order to work, each selector value must match a `NetworkAttachmentDefinition` object name in Multus.
Rook will attempt to auto-discover the network CIDRs for selected public and/or cluster networks.
This process is not guaranteed to succeed. Furthermore, this process will get a new network lease
for each CephCluster reconcile. Specify `addressRanges` manually if the auto-detection process
fails or if the selected network configuration cannot automatically recycle released network leases.

For `multus` network provider, an already working cluster with Multus networking is required. Network attachment definition that later will be attached to the cluster needs to be created before the Cluster CRD.
The Network attachment definitions should be using whereabouts cni.
If Rook cannot find the provided Network attachment definition it will fail running the Ceph OSD pods.
You can add the Multus network attachment selection annotation selecting the created network attachment definition on `selectors`.
Only OSD pods will have both public and cluster networks attached (if specified). The rest of the
Ceph component pods and CSI pods will only have the public network attached. The Rook operator will
not have any networks attached; it proxies Ceph commands via a sidecar container in the mgr pod.

A valid NetworkAttachmentDefinition will look like following:
A NetworkAttachmentDefinition must exist before it can be used by Multus for a Ceph network. A
recommended definition will look like the following:

```yaml
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: rook-public-nw
name: ceph-multus-net
namespace: rook-ceph
spec:
config: '{
"cniVersion": "0.3.0",
"name": "public-nad",
"type": "macvlan",
"master": "ens5",
"master": "eth0",
"mode": "bridge",
"ipam": {
"type": "whereabouts",
"range": "192.168.1.0/24"
"range": "192.168.200.0/24"
}
}'
```

* Ensure that `master` matches the network interface of the host that you want to use.
* IPAM type `whereabouts` is required because it makes sure that all the pods get a unique IP address from the multus network.
* The NetworkAttachmentDefinition should be referenced along with the namespace in which it is present like `public: <namespace>/<name of NAD>`.
e.g., the network attachment definition are in `default` namespace:
* Ensure that `master` matches the network interface on hosts that you want to use.
It must be the same across all hosts.
* CNI type `macvlan` is highly recommended.
It has less CPU and memory overhead compared to traditional Linux `bridge` configurations.
* IPAM type `whereabouts` is recommended because it ensures each pod gets an IP address unique
within the Kubernetes cluster. No DHCP server is required. If a DHCP server is present on the
network, ensure the IP range does not overlap with the DHCP server's range.

```yaml
public: default/rook-public-nw
cluster: default/rook-cluster-nw
```
NetworkAttachmentDefinitions are selected for the desired Ceph network using `selectors`. Selector
values should include the namespace in which the NAD is present. `public` and `cluster` may be
selected independently. If `public` is left unspecified, Rook will configure Ceph to use the
Kubernetes pod network for Ceph client traffic.

* This format is required in order to use the NetworkAttachmentDefinition across namespaces.
* In Openshift, to use a NetworkAttachmentDefinition (NAD) across namespaces, the NAD must be deployed in the `default` namespace. The NAD is then referenced with the namespace: `default/rook-public-nw`
Consider the example below which selects a hypothetical Kubernetes-wide Multus network in the
default namespace for Ceph's public network and selects a Ceph-specific network in the `rook-ceph`
namespace for Ceph's cluster network. The commented-out portion shows an example of how address
ranges could be manually specified for the networks if needed.

```yaml
network:
provider: multus
selectors:
public: default/kube-multus-net
cluster: rook-ceph/ceph-multus-net
# addressRanges:
# public:
# - "192.168.100.0/24"
# - "192.168.101.0/24"
# cluster:
# - "192.168.200.0/24"
```

##### Validating Multus configuration

Expand Down
137 changes: 132 additions & 5 deletions Documentation/CRDs/specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -2476,6 +2476,51 @@ string
</tr>
</tbody>
</table>
<h3 id="ceph.rook.io/v1.AddressRangesSpec">AddressRangesSpec
</h3>
<p>
(<em>Appears on:</em><a href="#ceph.rook.io/v1.NetworkSpec">NetworkSpec</a>)
</p>
<div>
</div>
<table>
<thead>
<tr>
<th>Field</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>public</code><br/>
<em>
<a href="#ceph.rook.io/v1.CIDRList">
CIDRList
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>Public defines a list of CIDRs to use for Ceph public network communication.</p>
</td>
</tr>
<tr>
<td>
<code>cluster</code><br/>
<em>
<a href="#ceph.rook.io/v1.CIDRList">
CIDRList
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>Cluster defines a list of CIDRs to use for Ceph cluster network communication.</p>
</td>
</tr>
</tbody>
</table>
<h3 id="ceph.rook.io/v1.Annotations">Annotations
(<code>map[string]string</code> alias)</h3>
<p>
Expand Down Expand Up @@ -2687,6 +2732,13 @@ int64
</tr>
</tbody>
</table>
<h3 id="ceph.rook.io/v1.CIDR">CIDR
(<code>string</code> alias)</h3>
<div>
<p>An IPv4 or IPv6 network CIDR.</p>
<p>This naive kubebuilder regex provides immediate feedback for some typos and for a common problem
case where the range spec is forgotten (e.g., /24). Rook does in-depth validation in code.</p>
</div>
<h3 id="ceph.rook.io/v1.COSIDeploymentStrategy">COSIDeploymentStrategy
(<code>string</code> alias)</h3>
<p>
Expand Down Expand Up @@ -3479,6 +3531,25 @@ string
</tr>
</tbody>
</table>
<h3 id="ceph.rook.io/v1.CephNetworkType">CephNetworkType
(<code>string</code> alias)</h3>
<div>
<p>CephNetworkType should be &ldquo;public&rdquo; or &ldquo;cluster&rdquo;.
Allow any string so that over-specified legacy clusters do not break on CRD update.</p>
</div>
<table>
<thead>
<tr>
<th>Value</th>
<th>Description</th>
</tr>
</thead>
<tbody><tr><td><p>&#34;cluster&#34;</p></td>
<td></td>
</tr><tr><td><p>&#34;public&#34;</p></td>
<td></td>
</tr></tbody>
</table>
<h3 id="ceph.rook.io/v1.CephStatus">CephStatus
</h3>
<p>
Expand Down Expand Up @@ -7869,6 +7940,29 @@ PoolSpec
</tr>
</tbody>
</table>
<h3 id="ceph.rook.io/v1.NetworkProviderType">NetworkProviderType
(<code>string</code> alias)</h3>
<p>
(<em>Appears on:</em><a href="#ceph.rook.io/v1.NetworkSpec">NetworkSpec</a>)
</p>
<div>
<p>NetworkProviderType defines valid network providers for Rook.</p>
</div>
<table>
<thead>
<tr>
<th>Value</th>
<th>Description</th>
</tr>
</thead>
<tbody><tr><td><p>&#34;&#34;</p></td>
<td></td>
</tr><tr><td><p>&#34;host&#34;</p></td>
<td></td>
</tr><tr><td><p>&#34;multus&#34;</p></td>
<td></td>
</tr></tbody>
</table>
<h3 id="ceph.rook.io/v1.NetworkSpec">NetworkSpec
</h3>
<p>
Expand All @@ -7889,7 +7983,9 @@ PoolSpec
<td>
<code>provider</code><br/>
<em>
string
<a href="#ceph.rook.io/v1.NetworkProviderType">
NetworkProviderType
</a>
</em>
</td>
<td>
Expand All @@ -7901,14 +7997,45 @@ string
<td>
<code>selectors</code><br/>
<em>
map[string]string
map[github.com/rook/rook/pkg/apis/ceph.rook.io/v1.CephNetworkType]string
</em>
</td>
<td>
<em>(Optional)</em>
<p>Selectors define NetworkAttachmentDefinitions to be used for Ceph public and/or cluster
networks when the &ldquo;multus&rdquo; network provider is used. This config section is not used for
other network providers.</p>
<p>Valid keys are &ldquo;public&rdquo; and &ldquo;cluster&rdquo;. Refer to Ceph networking documentation for more:
<a href="https://docs.ceph.com/en/reef/rados/configuration/network-config-ref/">https://docs.ceph.com/en/reef/rados/configuration/network-config-ref/</a></p>
<p>Refer to Multus network annotation documentation for help selecting values:
<a href="https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/how-to-use.md#run-pod-with-network-annotation">https://github.com/k8snetworkplumbingwg/multus-cni/blob/master/docs/how-to-use.md#run-pod-with-network-annotation</a></p>
<p>Rook will make a best-effort attempt to automatically detect CIDR address ranges for given
network attachment definitions. Rook&rsquo;s methods are robust but may be imprecise for
sufficiently complicated networks. Rook&rsquo;s auto-detection process obtains a new IP address
lease for each CephCluster reconcile. If Rook fails to detect, incorrectly detects, only
partially detects, or if underlying networks do not support reusing old IP addresses, it is
best to use the &lsquo;addressRanges&rsquo; config section to specify CIDR ranges for the Ceph cluster.</p>
<p>As a contrived example, one can use a theoretical Kubernetes-wide network for Ceph client
traffic and a theoretical Rook-only network for Ceph replication traffic as shown:
selectors:
public: &ldquo;default/cluster-fast-net&rdquo;
cluster: &ldquo;rook-ceph/ceph-backend-net&rdquo;</p>
</td>
</tr>
<tr>
<td>
<code>addressRanges</code><br/>
<em>
<a href="#ceph.rook.io/v1.AddressRangesSpec">
AddressRangesSpec
</a>
</em>
</td>
<td>
<em>(Optional)</em>
<p>Selectors string values describe what networks will be used to connect the cluster.
Meanwhile the keys describe each network respective responsibilities or any metadata
storage provider decide.</p>
<p>AddressRanges specify a list of CIDRs that Rook will apply to Ceph&rsquo;s &lsquo;public_network&rsquo; and/or
&lsquo;cluster_network&rsquo; configurations. This config section may be used for the &ldquo;host&rdquo; or &ldquo;multus&rdquo;
network providers.</p>
</td>
</tr>
<tr>
Expand Down
Loading

0 comments on commit 6baeca7

Please sign in to comment.