You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/blog/posts/2024/srlinux-asymmetric-routing.md
+25-25Lines changed: 25 additions & 25 deletions
Original file line number
Diff line number
Diff line change
@@ -12,11 +12,11 @@ authors:
12
12
This post dives deeper into the asymmetric routing model on SR Linux.
13
13
The topology in use is a 3-stage Clos fabric with BGP EVPN and VXLAN, with
14
14
15
-
*host `h1` single-homed to `leaf1`
16
-
*`h2` dual-homed to leaf2 and `leaf3`
17
-
* and `h3` single-homed to `leaf4`.
15
+
*server `s1` single-homed to `leaf1`
16
+
*`s2` dual-homed to leaf2 and `leaf3`
17
+
* and `s3` single-homed to `leaf4`.
18
18
19
-
Hosts h1 and h2 are in the same subnet, 172.16.10.0/24 while h3 is in a different subnet, 172.16.20.0/24. Thus, this post demonstrates Layer 2 extension over a routed fabric as well as how Layer 3 services are deployed over the same fabric, with an asymmetric routing model.
19
+
Servers s1 and s2 are in the same subnet, 172.16.10.0/24 while s3 is in a different subnet, 172.16.20.0/24. Thus, this post demonstrates Layer 2 extension over a routed fabric as well as how Layer 3 services are deployed over the same fabric, with an asymmetric routing model.
20
20
21
21
The physical topology is shown below:
22
22
@@ -42,13 +42,13 @@ topology:
42
42
leaf3:
43
43
leaf4:
44
44
45
-
h1:
45
+
s1:
46
46
kind: linux
47
47
image: ghcr.io/srl-labs/network-multitool
48
48
exec:
49
49
- ip addr add 172.16.10.1/24 dev eth1
50
50
- ip route add 172.16.20.0/24 via 172.16.10.254
51
-
h2:
51
+
s2:
52
52
kind: linux
53
53
image: ghcr.io/srl-labs/network-multitool
54
54
exec:
@@ -62,7 +62,7 @@ topology:
62
62
- ip link set eth2 up
63
63
- ip link set bond0 up
64
64
- ip route add 172.16.20.0/24 via 172.16.10.254
65
-
h3:
65
+
s3:
66
66
kind: linux
67
67
image: ghcr.io/srl-labs/network-multitool
68
68
exec:
@@ -77,28 +77,28 @@ topology:
77
77
- endpoints: ["leaf3:e1-2", "spine2:e1-3"]
78
78
- endpoints: ["leaf4:e1-1", "spine1:e1-4"]
79
79
- endpoints: ["leaf4:e1-2", "spine2:e1-4"]
80
-
- endpoints: ["leaf1:e1-3", "h1:eth1"]
81
-
- endpoints: ["leaf2:e1-3", "h2:eth1"]
82
-
- endpoints: ["leaf3:e1-3", "h2:eth2"]
83
-
- endpoints: ["leaf4:e1-3", "h3:eth1"]
80
+
- endpoints: ["leaf1:e1-3", "s1:eth1"]
81
+
- endpoints: ["leaf2:e1-3", "s2:eth1"]
82
+
- endpoints: ["leaf3:e1-3", "s2:eth2"]
83
+
- endpoints: ["leaf4:e1-3", "s3:eth1"]
84
84
```
85
85
86
86
/// admonition | Credentials
87
87
type: subtle-note
88
-
As usual, Nokia SR Linux nodes can be accessed with `admin:NokiaSrl1!` credentials and the host nodes use `user:multit00l` creds.
88
+
As usual, Nokia SR Linux nodes can be accessed with `admin:NokiaSrl1!` credentials and the host nodes use `user:multit00l`.
89
89
///
90
90
91
-
The end goal of this post is to ensure that host h1 can communicate with both h2 (same subnet) and h3 (different subnet) using an asymmetric routing model. To that end, the following IPv4 addressing is used (with the IRB addressing following a distributed, anycast model):
91
+
The end goal of this post is to ensure that server s1 can communicate with both s2 (same subnet) and s3 (different subnet) using an asymmetric routing model. To that end, the following IPv4 addressing is used (with the IRB addressing following a distributed, anycast model):
92
92
93
93
| Resource | IPv4 scope |
94
94
| :-----------------: | :--------------: |
95
95
| Underlay | 198.51.100.0/24 |
96
96
|`system0` interface | 192.0.2.0/24 |
97
97
| VNI 10010 | 172.16.10.0/24 |
98
98
| VNI 10020 | 172.16.20.0/24 |
99
-
| host h1 | 172.16.10.1/24 |
100
-
| host h2 | 172.16.10.2/24 |
101
-
| host h3 | 172.16.20.3/24 |
99
+
|server s1| 172.16.10.1/24 |
100
+
|server s2| 172.16.10.2/24 |
101
+
|server s3| 172.16.20.3/24 |
102
102
|`irb0.10` interface | 172.16.10.254/24 |
103
103
|`irb0.20` interface | 172.16.20.254/24 |
104
104
@@ -850,7 +850,7 @@ Similar to how ranges can be used to pull configuration state from multiple inte
850
850
851
851
### Host connectivity and ESI LAG
852
852
853
-
With BGP configured, we can start to deploy the connectivity to the servers and configure the necessary VXLAN constructs for end-to-end connectivity. The interfaces, to the servers, are configured as untagged interfaces. Since host h2 is multi-homed to leaf2 and leaf3, this segment is configured as an ESI LAG. This includes:
853
+
With BGP configured, we can start to deploy the connectivity to the servers and configure the necessary VXLAN constructs for end-to-end connectivity. The interfaces, to the servers, are configured as untagged interfaces. Since host s2 is multi-homed to leaf2 and leaf3, this segment is configured as an ESI LAG. This includes:
854
854
855
855
1. Mapping the physical interface to a LAG interface (`lag1`, in this case).
856
856
2. The LAG interface configured with the required LACP properties - mode `active` and a system-mac of `00:00:00:00:23:23`. This LAG interface is also configured with a subinterface of type `bridged`.
@@ -1017,7 +1017,7 @@ A:leaf4# info interface ethernet-1/3
1017
1017
1018
1018
### VXLAN tunnel interfaces
1019
1019
1020
-
On each leaf, VXLAN tunnel-interfaces are created next. In this case, two logical interfaces are created, one for VNI 10010 and another for VNI 10020 (since this is asymmetric routing, all VNIs must exist on all leafs that want to route between the respective VNIs). Since the end-goal is to have host h1 communicate with h2 and h3, only leaf1 and leaf4 are configured with VNI 10020 as well, while leaf2 and leaf3 are only configured with VNI 10010.
1020
+
On each leaf, VXLAN tunnel-interfaces are created next. In this case, two logical interfaces are created, one for VNI 10010 and another for VNI 10020 (since this is asymmetric routing, all VNIs must exist on all leafs that want to route between the respective VNIs). Since the end-goal is to have server s1 communicate with s2 and s3, only leaf1 and leaf4 are configured with VNI 10020 as well, while leaf2 and leaf3 are only configured with VNI 10010.
1021
1021
1022
1022
/// tab | leaf1
1023
1023
@@ -1563,7 +1563,7 @@ This completes the configuration walkthrough section of this post. Next, we'll c
1563
1563
1564
1564
When the hosts come online, they typically send a GARP to ensure there is no duplicate IP address in their broadcast domain. This enables the locally attached leafs to learn the IP-to-MAC binding and build an ARP entry in the ARP cache table (since the `arp learn-unsolicited` configuration option is set to `true`). This, in turn, is advertised as an EVPN Type-2 MAC+IP route for remote leafs to learn this as well and eventually insert the IP-to-MAC binding as an entry in their ARP caches.
1565
1565
1566
-
On leaf1, we can confirm that it has learnt the IP-to-MAC binding for host h1 (locally attached) and h3 (attached to remote leaf, leaf4).
1566
+
On leaf1, we can confirm that it has learnt the IP-to-MAC binding for server s1 (locally attached) and s3 (attached to remote leaf, leaf4).
1567
1567
1568
1568
```srl
1569
1569
A:leaf1# show arpnd arp-entries interface irb0
@@ -1579,7 +1579,7 @@ A:leaf1# show arpnd arp-entries interface irb0
This is an important step for asymmetric routing. Consider a situation where host h1 wants to communicate with h3. When the IP packet hits leaf1, it will attempt to resolve the destination IP address via an ARP request, as it is directly attached locally (via the `irb.20` interface), as shown below.
1644
+
This is an important step for asymmetric routing. Consider a situation where server s1 wants to communicate with s3. When the IP packet hits leaf1, it will attempt to resolve the destination IP address via an ARP request, as it is directly attached locally (via the `irb.20` interface), as shown below.
The `arp host-route populate evpn` configuration option is purely a design choice. Since a routing lookup is based on the longest-prefix-match logic (where the longest prefix wins), the existence of the host routes ensure that when there is a routing lookup for the destination, the host route is selected instead of falling back to the subnet route, which relies on ARP resolution, making the forwarding process more efficient. However, this also implies that a host route is created for every EVPN-learnt ARP entry, which can lead to a large routing table, potentially creating an issue in large-scale fabrics.
1689
1689
///
1690
1690
1691
-
Let's consider two flows to understand the data plane forwarding in such a design - host h1 communicating with h2 (same subnet) and h1 communicating with h3 (different subnet).
1691
+
Let's consider two flows to understand the data plane forwarding in such a design - server s1 communicating with s2 (same subnet) and s1 communicating with s3 (different subnet).
1692
1692
1693
-
Since h1 is in the same subnet as h2, when communicating with h2, h1 will try to resolve its IP address directly via an ARP request. This is received on leaf1 and leaked to the CPU via `irb0.10`. Since L2 proxy-arp is not enabled, the `arp_nd_mgr` process picks up the ARP request and responds back using its own anycast gateway MAC address while suppressing the ARP request from being flooded in the fabric. A packet capture of this ARP reply is shown below.
1693
+
Since s1 is in the same subnet as s2, when communicating with s2, s1 will try to resolve its IP address directly via an ARP request. This is received on leaf1 and leaked to the CPU via `irb0.10`. Since L2 proxy-arp is not enabled, the `arp_nd_mgr` process picks up the ARP request and responds back using its own anycast gateway MAC address while suppressing the ARP request from being flooded in the fabric. A packet capture of this ARP reply is shown below.
Once this ARP process completes, host h1 generates an ICMP request (since we are testing communication between hosts using the `ping` tool). When this IP packet arrives on leaf1, it does a routing lookup (since the destination MAC address is owned by itself) and this routing lookup will either hit the 172.16.10.0/24 prefix or the more-specific 172.16.10.2/32 prefix (installed from the ARP entry via the EVPN Type-2 MAC+IP route), as shown below. Since this is a directly attached route, it is further resolved into a MAC address via the ARP table and then the packet is bridged towards the destination. This MAC address points to an Ethernet Segment, which in turn resolves into VTEPs 192.0.2.12 and 192.0.2.13.
1697
+
Once this ARP process completes, host s1 generates an ICMP request (since we are testing communication between hosts using the `ping` tool). When this IP packet arrives on leaf1, it does a routing lookup (since the destination MAC address is owned by itself) and this routing lookup will either hit the 172.16.10.0/24 prefix or the more-specific 172.16.10.2/32 prefix (installed from the ARP entry via the EVPN Type-2 MAC+IP route), as shown below. Since this is a directly attached route, it is further resolved into a MAC address via the ARP table and then the packet is bridged towards the destination. This MAC address points to an Ethernet Segment, which in turn resolves into VTEPs 192.0.2.12 and 192.0.2.13.
1698
1698
1699
1699
```srl
1700
1700
A:leaf1# show network-instance default route-table ipv4-unicast route 172.16.10.2
@@ -1761,7 +1761,7 @@ A packet capture of the in-flight packet (as leaf1 sends it to spine1) is shown
The communication between host h1 and h3 follows a similar pattern - the packet is received in macvrf1, mapped VNI 10010, and since the destination MAC address is the anycast MAC address owned by leaf1, it is then routed locally into VNI 10020 (since `irb0.20` is locally attached) and then bridged across to the destination, as confirmed below:
1764
+
The communication between host s1 and s3 follows a similar pattern - the packet is received in macvrf1, mapped VNI 10010, and since the destination MAC address is the anycast MAC address owned by leaf1, it is then routed locally into VNI 10020 (since `irb0.20` is locally attached) and then bridged across to the destination, as confirmed below:
0 commit comments